<<

Int. J. Dev. Biol. 58: 533-549 (2014) doi: 10.1387/ijdb.140080nk

www.intjdevbiol.com

The Lophotrochozoan TGF-b signalling cassette - diversification and conservation in a key signalling pathway NATHAN J. KENNY1,3, ERICA K.O. NAMIGAI1, PETER K. DEARDEN2, JEROME H.L. HUI3, CRISTINA GRANDE4 and SEBASTIAN M. SHIMELD*,1

1Evolution and Development Research Group, Department of Zoology, University of Oxford, UK, 2Laboratory for Evolution and Development, Genetics Otago and Gravida, The National Centre for Growth and Development, Biochemistry Department, University of Otago, Aotearoa, New Zealand, 3School of Life Sciences, Chinese University of Hong Kong, Shatin, Hong Kong and 4Departamento de Biología Molecular and Centro de Biología Molecular "Severo Ochoa", CSIC-Universidad Autónoma de Madrid, Madrid, Spain

ABSTRACT TGF-b signalling plays a key role in the patterning of metazoan body plans and growth. It is widely regarded as a ‘module’ capable of co-option into novel functions. The TGF-b pathway arose in the Metazoan lineage, and while it is generally regarded as well conserved across evolu- tionary time, its components have been largely studied in the and Deuterostomia. The recent discovery of the Nodal molecule in molluscs has underlined the necessity of untangling this signalling network in lophotrochozoans in order to truly comprehend the evolution, conservation and diversification of this key pathway. Three novel genome resources, the molluscPatella vulgata, Pomatoceros lamarcki and Brachionus plicatilis, along with other publicly available data, were searched for the presence of TGF-b pathway genes. Bayesian and Maximum Likelihood analyses, along with some consideration of conserved structure, was used to confirm gene identity. Analysis revealed conservation of key components within the canonical pathway, allied with extensive diversification of TGF-b ligands and partial loss of genes encoding pathway inhibi- tors in some lophotrochozoan lineages. We fully describe the TGF-b signalling cassette of a range of lophotrochozoans, allowing firm inference to be drawn as to the ancestral state of this pathway in this Superphylum. The TGF-b signalling cascade’s reputation as being highly conserved across the Metazoa is reinforced. Diversification within the activin-like complement, as well as potential wide loss of regulatory steps in some Phyla, hint at specific evolutionary implications for aspects of this cascade’s functionality in this Superphylum.

KEY WORDS: TGF-b, , BMP, Activin, signalling

Introduction TGF-b cassette (Herpin et al., 2004, Matus et al., 2006, Adamska et al., 2007, Huminiecki et al., 2009). According to such analysis The TGF-b signalling pathway (Fig. 1A) has been well studied it is known that the original Metazoan repertoire consisted of at in a wide variety of traditional model systems, and is regarded least four TGF-b receptors and four Smads (Suga et al., 1999, as a ‘module’ (Wagner 1996) capable of regulating homeostasis, Huminiecki et al., 2009), although the extent of the original ligand growth, and differentiation in a range of contexts (Derynck and cassette and regulatory repertoire remains uncatalogued. While Miyazono 2008, Moustakas and Heldin 2009, Massagué 2012). While the TGF-b pathway most likely arose in its canonical form in the metazoan common ancestor (Pang et al., 2011), its diver- Abbreviations used in this paper: BAMBI, BMP and activin membrane-bound inhi- gence across the Metazoa and the ancestral roles played by its bitor; TGF, transforming growth factor; BMP, bone morphogenic protein; JTT, components are still in many ways unknown. Jones-Taylor-Thornton; ML, maximum likelihood; R-Smads, receptor-regulated Attempts have been made to categorise the ancestral metazoan Smads; Tsg, twisted gastrulation; WAG, Whelan and Goldman.

*Address correspondence to: Sebastian M. Shimeld. Evolution and Development Research Group, Department of Zoology, University of Oxford, South Parks Road, OX1 3PS, UK. Tel: +44-(0)-1865-281994. E-mail: [email protected] - web: http://zoo-shimeld.zoo.ox.ac.uk/home

Supplementary Material (one table and one figure) for this paper is available at: http://dx.doi.org/10.1387/ijdb.140080nk

Accepted: 20 June 2014.

ISSN: Online 1696-3547, Print 0214-6282 © 2014 UBC Press Printed in Spain 534 N.J. Kenny et al. fully-fledged TGF-b signalling components have not yet been 2007). TGF-b ligand binding to receptor serine/threonine kinases found outside the Metazoa, the choanoflagellateMonosiga brevi- can also be up- or down - regulated at the cell surface by mem- collis has been shown to possess a gene with an MH2 domain brane anchored co-receptors and receptors (Shi and Massagué similar to that of a Smad class protein (Srivastava et al., 2010, 2003). Some co-receptors, such as Cripto and the EGF-CFC class Pang et al., 2011). Both the placozoan Trichoplax adhaerens and of genes, allow active ligand-receptor complexes to be formed by the ctenophore Mnemiopsis leidyi have a fairly complete central acting as cofactors and are vital for the function of some ligands signalling pathway complement (Humineicki et al., 2009, Pang et (Cheng et al., 2003, Shen and Schier 2000). Down-regulation can al., 2011), and possess at least the basic receptor and be performed by pseudoreceptors such as BMP and membrane Smad complements found in the (Suga et al., 1999). The bound inhibitor (BAMBI, also known as Nma), which compete evolution of some elements of the cassette is less well understood, with functional Type I receptors for ligand binding (Onichtchouk particularly those components that act to modulate signalling. et al., 1998). The existence of these regulatory mechanisms has The TGF-b signalling pathway has been well studied in tradi- been noted in previously (van der Zee et al., 2008), tional ecdysozoan (van der Zee et al., 2008) and but the degree to which these are conserved across the Bilateria (Massagué et al., 2000) model systems such as Drosophila me- is unknown, and the possibility that these regulatory mechanisms lanogaster and Mus musculus. The few attempts that have been have diversified, changed in function or have been lost in some made at categorizing elements of the lophotrochozoan TGF-b lineages is yet to be explored. cassette have typically been limited to individual elements and/ Further intracellular regulation of TGF-b signalling also occurs. or single species (for example, Herpin et al., 2005, Freitas et al., FKBP12, Dad/SMAD7 recruited E3 ubiquitin ligases and SMAD 2007, Kuo and Weisblat 2011). Discoveries such as that of Nodal ubiquitination regulatory factors (SMURFs) can up- and down- in the (Grande and Patel 2009) have led to further inter- regulate signalling within the cell (Shi and Massagué 2003, Itoh est in the true pattern of conservation and diversification of genes and ten Dijke 2007). These, and other regulatory mechanisms, in this pathway. With the genomic resources now on-hand we often participate in other signalling cascades. The full repertoire should be able to trace its ancestral form and function, at least of regulatory interactions with the TGF-b cascade is beyond the across the Bilateria. scope of this manuscript, and we refer the interested reader to the In essence while the TGF-b pathway regulates a number of detailed reviews available on this topic (e.g. Shi and Massagué highly complex processes in tissues its core mode of action 2003, Moustakas and Heldin 2009, Al-Salihi et al., 2012). is simple, and can be seen schematically in Fig. 1A. TGF-b ligands For all its importance and ubiquity, the TGF-b pathway within form dimers and bind sequentially to Type II and Type I receptors, the Lophotrochozoa has yet to be satisfactorily documented. Some which form a complex and are phosphorylated. Upon activation of studies have found evidence of diversification at the ligand level the Type I receptor within the signalling complex, receptor-regulated within the leech Helobdella robusta (Kuo and Weisblat 2011) and Smads (R-Smads) are recruited from the cell membrane with the platyhelminthes (Gavino and Reddien 2011, Freitas et al., 2007), aid of proteins such as Smad anchor for receptor activation (SARA) while other aspects of the signalling pathway may be conserved (Itoh and ten Dijke 2007). R-Smads are then phosphorylated and (Molina et al., 2011) although this is unclear (Kuo and Weisblat activated by the receptor complex (Massagué et al., 2005). R-Smads 2011). The identification of the full signalling complements of a can be divided into two families, depending on the ligands to which number of lophotrochozoan species should provide a springboard they respond– Mad/Smad 1/5/8 responds to BMP signalling, and for the disentanglement of this network. Smox/Smad 2/3 to TGF-b, Activin, and Nodal signalling (Heldin and Here we present a comprehensive catalogue of the compo- Moustakas 2012). Once activated, R-Smads bind to a co-Smad nents of TGF-b signalling in members of the lophotrochozoan (Medea/Smad 4) to form a complex that mediates transcription in Superphylum (Fig. 1B), allowing for the first time a Metazoa-wide the nucleus, resulting in up- and down-regulation of target genes understanding of the evolution and divergence of this crucial signal- (Ross and Hill 2008). Inhibitory Smads (known as Dad or Smad ling pathway. We investigated the genomes of the mollusc Lottia 6/7), compete with R-Smads for activation by receptor complexes, gigantea, the annelid Capitella teleta (Simakov et al., 2013), the thus regulating the pathway (Ross and Hill 2008). bdelloid rotifer Adenita vaga (Flot et al., 2013) and the planarian Complexity is introduced to the TGF-b signalling cascade by the Schistosoma mansoni (Berriman et al., 2009), along with tran- diversity of regulatory mechanisms which modulate it both extra- scriptomic and novel genomic resources for the monogont rotifer and intra-cellularly, and which are perhaps more free to vary than Brachionus plicatilis, limpet mollusc Patella vulgata and the serpulid the core signalling cascade. TGF-b ligands can be removed from annelid Pomatoceros lamarcki. We demonstrate the existence of the extracellular environment by ligand traps, such as the Chor- the majority of the core TGF-b cassette in the Lophotrochozoa, dins, Noggins and DANs (Balemans and Van Hul 2002). These confirming the conservation of this ‘module’ across evolutionary are vital for many aspects of development, including the correct time, albeit with extensive diversification of the ligand complement specification of dorsoventral polarity, and have been catalogued in in this lineage and loss of genes encoding extracellular inhibitors ecdysozoans and (Holley et al., 1995). It has been in some species. observed that ecdysozoans have less diversity in these protein classes than models (van der Zee et al., 2008). Tolloid, Results a zinc metalloprotease, is capable of cleaving Chordin, hence re- releasing the trapped ligand, as well as cleaving other potential TGF-b ligands repressors of TGF-b signalling, such as proteoglycans (Scott et TGF-b ligands participate in a diverse range of mechanisms in al., 1999). This plays a key role in establishing the body plan of cellular specification and functionality. They are synthesized as early embryos, including those of lophotrochozoans (Herpin et al., relatively long precursor proteins, but an N-terminal propeptide The Lophotrochozoan TGF-b signalling cassette 535 is cleaved during processing, leaving a short, 110-140 amino Whelan and Goldman (WAG) model, 1000 bootstrap replicates). acid mature ligand. The mature ligand can be recognized by the To better ascertain phylogenetic relationships within and between characteristic conservation of at least six cysteine residues that TGF-b ligands, we have analysed the interrelationships of the when folded form a structure known as a cysteine knot, stabilized TGF-b class in two steps - firstly, by assigning, on the basis of by three disulfide bonds (ten Dijke and Arthur 2007). the preliminary tree shown in Supplementary Fig. 1, and by Blast There are 33 distinct genes encoding TGF-b ligands in humans, identity (Altshul et al., 1990), ligands to either the BMP class or seven in D. melanogaster and five inCaenorhabditis elegans (Hu- the Activin/TGF-b class, and secondly by analysing these two mineicki et al., 2009). These ligands can be split into two broad classes separately. classes, the TGF-b/Activin/Myostatin class (generally referred The results of maximum likelihood and Bayesian inference of to here as the TGF-b class) and the BMP (bone morphogenetic their phylogenetic relationships for the BMP class can be seen protein) class (Yamamoto and Oelgeschläger 2004). Generally in Fig. 2. Both means of phylogenetic inference recover clear these classes are mirrored by the signalling pathway through which familial relationships between lophotrochozoan sequences and they operate (see Fig. 1A for details) but this is not always the orthologues of many well-described genes. Bootstrap values are case - Nodal, for instance, is generally said to belong to the BMP not always high, most likely due to the relatively short dataset (88 class, but Nodal signals are transduced via the TGF-b pathway. amino acid alignment) from which these samples were drawn. We should note that nomenclature regarding whether these are Posterior probabilities, however, clearly support many nodes on ‘families’ or ‘classes’ varies in the literature. In this study we have the Bayesian tree; for example, the ADMP clade has a posterior followed the Linnaean convention, where classes are broader probability of 1 under Bayesian analysis, but bootstrap support of groupings – either TGF-b or BMP-related, while families refer to 71 under ML analysis. groupings of orthologous genes within the class set. The existence and expression of several members of the BMP In many ways the complements of TGF-b ligands found in class of the TGF-b ligand superclass in the Lophotrochozoa has lophotrochozoan genomes are similar to those found in more already been established by prior studies, although the phyloge- classical model organisms. This similarity breaks down, however, netic distribution of these studies has been scattered and the full in the TGF-b class, which appears to have undergone extensive lophotrochozoan complement was unclear (e.g. Nederbragt et divergence in this Superphylum. Initial attempts at making phylo- al., 2002, Freitas et al., 2007, Grande and Patel 2009, Kuo and genetic trees for TGF-b ligands were confounded by the diversity Weisblat 2011). It seems that and molluscs have retained of ligand sequence in this class, which resulted in poor alignments the majority of the diversity of the BMP class found in the Ecdyso- and multiple gaps, and, ultimately, poorly-supported trees. Supple- zoa and Deuterostomia, and in many cases better conservation mentary Fig.1 shows one such tree (Maximum Likelihood (ML), is found than in ecdysozoan models. In contrast, B. plicatilis and,

BMP Pathway Lophotrochozoa Platyhelminthes A Activin/Myostatin B Activin Dominated Nodal Noggin Gastrotricha Myostatin Chordin BMP 2/4 Follistatin Hairy-bellied Worms Vg1 Tolloid DAN Family BMP 5-8 DAN Family TGF Follistatin BMP 3 (ADMP) Tsg GDF 6 Rotifera ALP Wheel

Gnathostomulida Jaw Worms (Ectoprocta) Moss Animals Type 2 Type 2 Activin R2 Type 1 Type 1 BMP R2 (Punt) ALK 1/2/3/6 (Wishful ALK 4/5/7 Extracellular Space Goblet Worms (TGF 1/ (ActR1/ Thinking) Baboon) BAMBI BMP R1/ Tkv/Sax) Mollusca Snails, , Bivalves P Cytosol P Annelida Ringed, Spoon and Peanut Worms Smad 2/3 Smad 6/7 Smad 1/5/8 Ribbon Worms

P SMURF P ,

Smad 4 Ecdysozoa , , Priapulids Deuterostomia , Sea Stars, Sea Squirts Diploblastica Nucleus Transcriptional Regulation

Fig. 1. Summary of TGF-b/BMP signalling cascades and Lophotrochozoan interrelationships. (A) Canonical TGF-b/BMP signalling cascades: representation of the canonical signalling pathways for TGF-b-like and BMP-like cascades with inhibitors of signalling shown in red and operational signalling shown in black. Only ligands with well-known affinity to one or other signalling pathway listed, with each pathway operating through different combinations of Type I and Type II receptors, and hence signalling through either Smad 2/3 or Smad 1/5/8 proteins intracellularly. Ligands are regulated extracellularly by a diverse range of inhibitors, which can themselves be cleaved by Tolloid to release ligands and allow signalling to occur. Intracellular regulation of signalling can occur at the receptor level, with BMP and activin membrane-bound inhibitor (BAMBI) recruited in the place of functional Type I receptors, or intracellularly, through Smad 6/7 inhibition of signal transduction from receptors, SMURF-mediated degradation of Smad signalling proteins, or a range of further mechanisms not shown here. (B) Cladogram representing lophotrochozoan (boxed) and metazoan inter-relationships, based on Dunn et al., (2008). Non-bilaterally symmetrical metazoans are represented by the paraphyletic group “Diploblastica”. 536 N.J. Kenny et al.

in particular, S. mansoni, have apparently lost many of the more these are not recovered as a monophyletic grouping in Fig. 2, they BMP-like members of the TGF-b ligand cassette. The criteria used have appeared in previous studies as a well-supported clade of as a basis for assignations of genes to particular families can be deuterostome-specific genes, for example in Lapraz et al., (2006). seen in the Materials and Methods section. The H. robusta BMP 2/4b sequence (AEL12442.1) has also been The apparent Ecdysozoan-specific loss of Nodal and BMP3/ noted as showing some resemblance to the Univin/VG-1/GDF1/3- GDF10 is corroborated by our analysis, but we recover well-sup- like family. An annelid gene, which we have named “UNKNOWN” in ported nodes containing lophotrochozoan orthologues for these Fig. 2, groups with poor support with the GDF1/3 family in Bayesian gene families. For more detailed exploration of Nodal ligands, we trees. Better sampling is needed across the Lophotrochozoa in refer the interested reader to Grande et al., 2014 in this issue. More order to confirm whether these are truly orthologous to the Vg1/ complex is the case of the Univin/VG-1/GDF1/3-like family. While Univin/GDF1/3 genes in the Deuterostomia.

99 Mus musculus GDF7 NP038555.1 Mus musculus99 Mus GDF6 musculus NP038554.1 GDF7 NP038555.1 Mus musculus GDF7 NP038555.1 A 34 B 1 59 Mus musculus GDF6 NP038554.1 Mus musculus GDF7 NP038555.1 Mus34 musculus GDF5 NP032135.2 0.8M5 us musculus GDF6 NP038554.1 42 59 0.71 1 Nematostella vectensisMus musculus GDF5 GDF5 AAR27581.1 NP032135.2 Mus m0u.8scM5 ulsu ms GuscDFu5lu NsP G032135DF6 N.2P038554.1 42 0.71 Branchiostoma Nematostella floridae GDF vectensis 567 XP002602867.1 GDF5 AAR27581.1 0.98 Nematostella vecMtuesn smisu GscDuFlu5s AAR GDF275815 NP032135.1 .2 0.98 26 Capitella Branchiostoma teleta BMP10 floridae35187 GDF 567 XP002602867.1 Branchiostoma floNreidmaeat GosDtFell 567a vec XPte002602867nsis GDF.51 AAR27581.1 32 Branchiostoma floridae GDF 567 XP002602867.1 26 Lottia32 gigantea Capitella BMP10 teleta 111943 BMP10 35187 Capitella teleta BMP10 35187 40 0.87 0.88 98 Lottia gigantea BMP10 111943 LottiaC gaipgiatenlltaea t eBleMtaP B 10M 11194P10 35183 7 Patella40 vulgata BMP10 0.87 1 0.92 0.88 Apis mellifera98 BMP10Patella vulgataXP001120039.1 BMP10 Patella1 vulgatLoa BttMiaP g1i0gantea BMP 10 111943 80 0.92 Patella vulgata BMP10 Mus musculus Apis BMP10 mellifera NP033886.2 BMP10 XP001120039.1 1 Apis mellifera BMP10 XP001120039.1 80 1 Apis mellifera BMP10 XP001120039.1 84 Mus musculus Mus musculus BMP9 AAD56961.1 BMP10 NP033886.2 1 Mus musculus BMP10 NP033886.2 84 Mus musculus BMP10 NP033886.2 Capitella teleta BMP5-8 Mus musculus 172350 BMP9 AAD56961.1 Mu1s musculus BMP9 AAD56961.1 Capitella teleta BMMPu5s-8 m 17235uscu0lus BMP9 AAD56961.1 21 Mus musculus Capitella BMP8 teleta NP031584.1 BMP5-8 172350 Lottia gCiagpaintetealla BteMlePta5 -B8M 19588P5-82 172350 97 Lottia 2gigantea1 Mus BMP5-8 musculus 195882 BMP8 NP031584.1 1 Patella vulgLoattta iBa MgiPg5a-n8tea BMP5-8 195882 44 Patella vulgata97 Lottia BMP5-8 gigantea Locus BMP5-8 26861 195882 0.69 0.85 1 5 44 Saccog0.69lo0ss.85usP kaotewllaal evskvulgiia tBaM BPM5P-85 -N8P_001158388.1 18 Saccoglossus Patella kowalevskii vulgata BMP5-8 BMP5-8 NP_001158388.1 Locus 26861 5 Mus muSscacculuogs lBoMssPu6s NkoPw031582alevsk.1ii BMP5-8 NP_001158388.1 17 18 Mus musculus Saccoglossus BMP6 NP031582.1 kowalevskii BMP5-8 NP_001158388.1 0.68 0.32 8 Mus muscuMlus BmMuscP7u NluPs031583 BMP6. 2NP031582.1 94 66 17 Mus musculus BMP6 NP031582.1 0.24 0.62 0.68 0.32 Mus musculus8 BMP7 NP031583.2 0.99 Mus musculus BMP7 NP031583.2 94 66 0.24 Mus0 m.62usculus BMP8 NP031584.1 Mus musculus Mus BMP5 musculus NP031581.2 BMP7 NP031583.2 0.97 0.44 0.99 14 Mus musculus BMPM5 uNsP m031581uscul.u2s BMP8 NP031584.1 Mus musculus BMP5 NP031581.2 0.97 0.44 Branchiostoma14 floridae BMP5-8 C3ZTY2 BranchiostomMau fslo mriudscaeu BluMsP B5M-8P C53 NZTP031581Y2 .2 Branchiostoma floridae BMP5-8 C3ZTY2 16 Apis mellifera Gbb XP394252.1 Branchiostoma floridae BMP5-8 C3ZTY2 25 0.97 Apis mellifera Gbb XP394252.1 Drosophila16 Apis melanogaster mellifera Gbb Gbb XP394252.1 AAF47075.1 25 0.22 0.46 Dro0s.9oph7 Apilias melllianogfera asGbbter XPGbb394252 AAF.470751 .1 17 Drosophila melanogaster Gbb AAF47075.1 9 Nematostella vectensis BMP5-8 XP 001638358.1 0.607.22 Nema0t.o46stella vecDternosiophs BMilaP m5-e8l aXPnog 001638358aster Gbb.1 AAF47075.1 17 Pomatoceros9 Nematostella lamarckii BMP5-8vectensis BMP5-8 XP 001638358.1 Po0.m67atoceNreoms alatomsatreckllaii vec BMtePn5s-8is BMP5-8 XP 001638358.1 Saccoglossus kowalevskii Pomatoceros Univin lamarckii ACY92678.1 BMP5-8 Pomatoceros lamarckii BMP5-8 0.99 Saccoglossus kowalevskii Univin ACY92678.1 71 Saccoglossus kowalevskii Univin ACY92678.1 Branchiostoma floridae Univin XP002589220.1 Bra0n.9c9hioSsacctomogal oflssoruidsae ko Uwnailvevskin XPii 002589220Univin AC.1Y92678.1 71 16 38 Pomatoceros lamarckii Branchiostoma BMP2/4 floridae Univin XP002589220.1 Branchiostoma floridae Univin XP002589220.1 1Lottia gigantea BMP2/4 205842 Capitella16 38 teleta Pomatoceros BMP2/4 173895 lamarckii BMP2/4 Patella vulgLoattta iBa MgiPg2a/n4t AeaF B44009MP27/4 205842 0.31 1 Capitella teleta BMP2/4 173895 0.25 Patella vulgata BMP2/4 AF440097 44 Helobdella robusta BMP2/4b AEL12442.1 0.26 00..3219Mus musculus BMP4 AAC37698.1 44 Helobdella robusta BMP2/4b AEL12442.1 0.25 Mus musculus BMP4 AAC37698.1 Helobdella robusta BMP2/4a AEL12441.1 0.26 0.38Mus musc0u.2l9us BMP2 NP031579.2 Helobdella robusta BMP2/4a AEL12441.1 Mus musculus BMP2 NP031579.2 99 Lottia gigantea BMP2/4 205842 0.44 0.86Saccogl0o.3ss8 us kowalevskii BMP2/4 NP001158387.1 13 6 Patella vulgata99 Lottia BMP2/4 gigantea AF440097 BMP2/4 1 205842 0.4N4 e0m.86Sataccostogelllao vecssutse nkosiws aBlevskMP2ii/4 B AARMP213362/4 NP001158387.1 .1 13 5 6 0.98 Nematostella vectensis BMP2/4 AAR13362.1 Drosophila melanogasterPatella vulgata Dpp BMP2/4 AAN10431.1 AF440097 1 0.96 Apis mellifera Dpp XP001122815.2 5 0.98 Apis mellifera Drosophila Dpp XP001122815.2 melanogaster Dpp AAN10431.1 Dros0oph.96 ilAap mise mlanogellifeasrat eDrpp Dpp XP AAN00112281510431.21 0.59 Drosophila melanogaster Dpp AAN10431.1 36 Mus musculus Apis BMP2 mellifera NP031579.2 Dpp XP001122815.2 Branchiostoma floridae BMP 2/4 AAC97488.1 57 0.59 Branchiostoma floridae BMP 2/4 AAC97488.1 Mus musculus36 Mus BMP4 musculus AAC37698.1 BMP2 NP031579.2 Pomatoceros lamarckii BMP2/4 57 0.25 Pomatoceros lamarckii BMP2/4 Nematostella Mus musculus vectensis BMP4BMP2/4 AAC37698.1 AAR13362.1 Capitella teleta BMP2/4 173895 10 0.42 0.25 HCealpobditelleall ate rlobueta BstMa PB2M/4P 173892/4b A5EL12442.1 Saccoglossus10 kowalevskii Nematostella BMP2/4 vectensis NP001158387.1 BMP2/4 AAR13362.1 0.91 0.42 3 HelobdellHae rlobdobuesllta BrobuMP2s/t4aa B AMEPL212441/4b A.1EL12442.1 23 Branchiostoma3 Saccoglossus floridae BMP2/4 kowalevskii AAC97488.1 BMP2/4 NP001158387.1 0.47 0.91 23 Mus musculus GDF3H NelPobd032134ella .r2obusta BMP2/4a AEL12441.1 Mus musculus Branchiostoma BMP3 NP775580.1 floridae BMP2/4 AAC97488.1 0.47 0.89 5 0.89 Mus Mmussc muuluscs uGlDusF 1G NDPF0321333 NP032134.2 .2 53 Capitella Mus teleta musculus BMP3 184704 BMP3 NP775580.1 0.47 5 Mus musculus GDF1 NP032133.2 53 0.47 Pomatoceros lamarckii UNKNOWN 17 Pomatoceros Capitella teleta lamarckii BMP3 BMP3 184704 1 13 1 CapitellaP toemletaat oUNKNceros OlaWmNar ck2952ii UNKN9 OWN 17 Brachionus Pomatoceros plicatilis BMP3 lamarckii BMP3 42 13 Mus musculus BMP3 NP775580.1 Capitella teleta UNKNOWN 29529 Brachionus plicatilis BMP3 Branchiostoma42 japonicum BMP3 ACF93445.1 0.66 Mus musculus BMP3 NP775580.1 0.8 Capitella teleta BMP3 184704 8 Saccoglossus Branchiostoma kowalevskii BMP3 japonicum XP_002735398.1 BMP3 ACF93445.1 0.66 0.94 0.8 CPaopmitealltoa ceterleotsa lBamMaPr3ck 18470ii BM4P3 8 Helobdella Saccoglossus robusta ADMP kowalevskii AEL12443.1 BMP3 XP_002735398.1 0.95 Pomatoceros lamarckii BMP3 0.95 0.94 Brachionus plicatilis BMP3 90 Helobdella robusta ADMP AEL12443.1 0.96 Patella vulgata ADMP 0.2 Branchiostoma japonicum BMP3 ACFB93445rach.i1onus plicatilis BMP3 71 90 Patella vulgata ADMP 0.96 Lottia gigantea ADMP 110168 0.2 Saccoglossus Bkorawnaclhevskiostiio BmMa Pja3pon XP_002735398icum BMP3.1 ACF93445.1 71 Lottia gigantea ADMP 110168 0.95 64 Danio rerio ADMP NP571951.2 HeSlobdaccogellalo rssobuuss ktoa wADalevskMP Aii EBLM12443P3 XP.1_002735398.1 1 0.95 32 Capitella64 teleta DanioADMP rerio 184506 ADMP NP571951.2 Patella vulgata ADMHPelobdella robusta ADMP AEL12443.1 12 1 1 1 32 Capitella teleta ADMP 184506 Patella vulgata ADMP Saccoglossus12 kowalevskii ADMP NP_001158394.1 Lo1 ttia gi1gantea ADMP 110168 6 25 Branchiostoma Saccoglossus floridae ADMP kowalevskii XP002604737.1 ADMP NP_001158394.1 0.8 Danio rerLoio ttADia MgiPg aNnPtea571951 ADM.2P 110168 6 25 0.8 Nematostella vectensis Branchiostoma ADMP related floridae AFP87424.1 ADMP XP002604737.1 0.42Capitella teleta ADDanMioP r18450erio AD6 MP NP571951.2 Nematostella vectensis ADMP related AFP87424.1 0.67 0.42Capitella teleta ADMP 184506 Capitella teleta Nodal 110325 0.45 Saccoglossus kowalevskii ADMP NP_001158394.1 79 Capitella teleta Nodal 110325 0.67 Saccoglossus kowalevskii ADMP NP_001158394.1 Pomatoceros79 lamarckii Nodal Branc0h.4i5ostoma floridae ADMP XP002604737.1 Lottia gigantea Pomatoceros Nodal ACB42423.1 lamarckii Nodal 0.31 Nematostella vectensis ADMPB raenlacthedio AstFoPm87424a flor.1idae ADMP XP002604737.1 70 0.31 14 Nematostella vectensis ADMP related AFP87424.1 99 Patella70 vulgata Lottia Nodal gigantea 2 Nodal ACB42423.1 Capitella teleta Nodal 110325 14 1 97 Patella9 9vulgata Patella Nodal vulgata 1 Nodal 2 PomaCtoapceitreollsa ltaemleatrack Niiod Naodl 11032al 5 32 0.95 1 Brachionus97 plicatilisPatella vulgata Nodal Nodal 1 Lottia gigaPnoteam aNtoodcearlo ACBs lam42423arckii.1 Nodal 32 1 0.95 17 0.41 Patella Lovulttgiaat ag iNgodantaeal 2 Nodal ACB42423.1 Branchiostoma floridae Brachionus Nodal AAL99367.1 plicatilis Nodal 1 1 17 0.41 Patella vulgata Nodal 2 Mus musculus Branchiostoma Nodal AAI28019.1 floridae Nodal AAL99367.1 0.63 Patella vu1lgata Nodal 1 23 70 Brachionus plicaPtailitesll Na odvualgl ata Nodal 1 23 Saccoglossus70 Mus kowalevskii musculus Nodal Nodal C AAI28019.1 NP001164721.1 0.63 0.87 Branchiostoma floridae NBodracal hAAionuL99367s plica.1tilis Nodal 87 Saccoglossus kowalevskiiSaccoglossus Nodal kowalevskii A ACY92597.1 Nodal C NP001164721.1 0.87 Mus musculuBs rNaodnchali oAAstoI28019ma flo.1ridae Nodal AAL99367.1 42 Saccoglossus87 Saccoglossus kowalevskii kowalevskiiNodal B NP001161612.1 Nodal A ACY92597.1 0.92 1 42 0.92 SaccoglossMus mkouwscaluevsklus iiN Nododala AAl C IN28019P001164721.1 .1 Mus musculus GDF3 Saccoglossus NP032134.2 kowalevskii Nodal B NP001161612.1 0.98 1 Saccoglossus kowSaacclevskogiilo Nssodusa lk Aow ACalevskY92597ii N.od1 al C NP001164721.1 61 Mus musculus Mus musculus GDF1 NP032133.2 GDF3 NP032134.2 0.98 0.98 61 Saccog0l.o98ssus Skoaccwaoglevsklossii uNsod koawl Bal evskNP001161612ii Nodal A.1 ACY92597.1 45 Capitella Mus musculus teleta GDF2/Maverick GDF1 NP032133.2 38881 18 SaccoglossusC kaopwitaelllevska teliie tNaod GDalF B2/ MNPave001161612rick 3888.1 59 Patella45 vulgata GDF2/Maverick Capitella teleta GDF2/Maverick 38881 0.85 Capitella teleta GDF2/Maverick 38881 18 80 59 0.91 0.8P5omatoceros lamarckii GDF2/Maverick Pomatoceros Patella lamarckii vulgata GDF2/Maverick GDF2/Maverick 1 27 80 0.99 Pat0e.9ll1a vulgata GDF2P/Momaveatroiccek ros lamarckii GDF2/Maverick Crassostrea gigasPomatoceros GDF2/Maverick lamarckii CAD67714.1 GDF2/Maverick 1 27 0.99CrassostreaP agtiegllasa v GuDlgFa2ta/M GaveDFr2ick/M aveCADric67714k .1 Crassostrea gigas GDF2/Maverick CAD67714.1 0.46 Strongylocentrotus purpuratus Maverick GLEAN3_18248 0.46 DrosophCrassilao msterleaanog gigasast eGr DMFave2/Mriaveck AArickF CAD5932867714.1 .1 55 1 89 Drosophila melanogaster Strongylocentrotus Maverick AAF59328.1 purpuratus Maverick GLEAN3_18248 0.97 Apis mellifera BDMroPs2ophB/Milava m XPela001122118nogaster. 2Maverick AAF59328.1 89 55 0.97 1 98 Apis mellifera Drosophila BMP2B/Maverick melanogaster XP001122118.2 Maverick AAF59328.1 1 StrongyloceAnptisro mtueslli puferrapu BMraPtu2sB M/Maveav rXPick001122118 GLEAN3. 218248 Mus musculus BMP1598 NP033887.1 Apis mellifera BMP2B/Maverick XP001122118.2 1 Strongylocentrotus purpuratus Maverick GLEAN3 18248 0.98 Mus musculus BMP15 NP033887.1 95 Mus musculus BMP15 NP033887.1 Mus musculus GDF9 NP032136.2 0.98 Mus mMuuscsu mluussc GuDluFs9 BNMP032136P15 NP.2033887.1 95 Pomatoceros Mus musculus lamarckii GDF9 UNKNOWN NP032136.2 Adineta vaga CAWI010044114.1 GSMADusV mTu0000781900sculus GD1F9 NP032136.2 80 Capitella teleta PomatocerosUNKNOWN 29529 lamarckii UNKNOWN 1 Adineta vaga CAWI010044114.1 GSADVT00007819001 0.69Adineta vaga CAWI010039441.1 SNAPT00081775001 80 Capitella teleta UNKNOWN 29529 1 Adineta vaga CAWI010044114.1 GSADVT00007819001 Adineta vag0.a69 ACAdinWeIt010043923a vaga CA.1W GI010039441SADVT0001367700.1 SNAPT100081775001 78 Adineta vaga CAWI010039441.1 Adineta vaga CAWI010044114.1 SNAPT00081775001 GSADVT00007819001 Adineta vagMau CAs mWuscI010043923ulus Neu.1rt GurSinAD NVPT0327640001367700.1 1 59 78 Adineta vaga CAWI010039441.1 SNAPT00081775001 1 Mus musculus Neurturin NP032764.1 Adineta vaga CAWI010043923.1 GSADVT00013677001 0.6 Mus musculus GDNF NP 034405.1 59 1 Adineta Mus musculus vaga CAWI010043923.1 Neurturin NP 032764.1 GSADVT00013677001 0.95 Xenopus0 .l6aevisM GuDNs mFu NscPu 001090196lus GDNF. 1NP 034405.1 Danio rerio GDNF NP Mus 571807.1 musculus Neurturin NP 032764.1 Da0n.9i5o rXereinopuo GDNs Flaev NPis 571807 GDNF. 1NP 001090196.1 99 Danio rerio GDNF NP 571807.1 67 Mus99 musculus Danio GDNF rerio NP GDNF 034405.1 NP 571807.1 61 Xenopus67 laevis Mus GDNF musculus NP 001090196.1 GDNF NP 034405.1 61 Xenopus laevis GDNF NP 001090196.1 0.3 0.2 0.3 0.2

Fig. 2. Phylogeny of the BMP-like ligand class familial interrelationships across the Metazoa, as determined by (A) maximum likelihood (Tamura et al., 2011) and (B) Bayesian (Huelsenbeck and Ronquist 2001) methods. Alignment generated by MAFFT (Katoh and Standley 2013) using the L-INS-i strategy resulting in an 88 amino acid informative alignment of the mature ligand domains after the exclusion of gaps. Both phylogenies determined using the WAG model (ML: +4G) (Whelan and Goldman 2001). Bootstrap percentage (of 1,000 replicates) and posterior probabilities (after 3,000,000 generations) can be seen at the nodes of ML and Bayesian trees respectively. Lophotrochozoan sequences underlined in red. Phylogenies rooted with known Neuturin and GDNF outgroups. Sequences used in phylogenetic analysis, along with alignment, can be found in Supplementary File 1. Scale bars represent substitutions per site at given distances. The Lophotrochozoan TGF-b signalling cassette 537

Canonical homologues of BMP 2/4 (Dpp), BMP 5-8 (Gbb/Scw), tentative TGF-b homologue in M. leidyi (Pang et al., 2011) stand- BMP 9/10 (GDF5-7), ADMP and Maverick are also found in the ing as evidence against this. We have included this sequence in Lophotrochozoa as reported by our trees (Fig. 2). These appear the tree shown in Fig. 3, and, as stated in Pang et al., (2011), it is to be far better conserved in the Mollusca and Annelida than in only weakly supported as a homologue of the TGF-b clade sensu other lophotrochozoan lineages examined. No homologues for stricto. Wider taxon sampling at the base of the Metazoa, and GDF9/BMP15 can be found in our lophotrochozoan datasets, particularly in the beyond M. leidyi, would allow us to implying that this is a deuterostome or even innovation test whether TGF-b ligands (in the strictest sense) are indeed a (as suggested by the lack of such a ligand in Lapraz et al., 2006). deuterostome novelty. Fig. 3 shows the inferred identity of the genes of the activin/ Our investigations of the genome of A. vaga revealed a total of myostatin/inhibin-like clade as determined by phylogenetic analysis. 10 ligands with gene models or transcript support for their existence. Clear and reproducible signals were found for a canonical Myostatin Five further genes were present in their entirety in the genome, clade, especially in the case of Bayesian analysis, where Myostatins with some gene models supporting their existence, but without cluster together with a posterior probability of 1. P. lamarcki appears evidence of transcription. As such, these were not included in the to have duplicated this gene, but this is not found in other annelids, curated gene list appearing in that genome’s publication (Flot et and is likely lineage-specific. Support for an Activin/Inhibin clade al., 2013). However, as transcripts could be temporally restricted in is weak in both Bayesian and Maximum Likelihood phylogenies appearance, or present in very low levels, we have included these shown. Clades for TGF-b and Lefty homologues in deuterostomes sequences in Supplementary File 1 for consideration by interested are consistently recovered with good support. On the basis of the parties. In particular, we note the existence of three complete BMP tree presented here, Lefty and TGF-b ligands sensu stricto seem 5-8 genes with no evidence of pseudogenisation. to be deuterostome innovations, with the previous report of a The large number of ligands seen in A. vaga is the result of

76 Mus musculus Inhibin a NP032406.1 A 41 Mus musculus Inhibin b NP032407.1 B Mus musculus Inhibin a NP032406.1 6 Apis mellifera Activin/Inhibin XP001123044.2 0.43 97 Drosophila melanogaster Activin/Inhibin NP651942.2 0.88 Mus musculus Inhibin b NP032407.1 5 Pomatoceros lamarckii Activin/Inhibin 0.75 Mus musculus Inhibin e NP032408.2 Apis mellifera Activin/Inhibin XP001123044.2 Mus musculus Inhibin e NP032408.2 1 2 Drosophila melanogaster Activin/Inhibin NP651942.2 95 Lottia gigantea Activin/Inhibin 151945 Branchiostoma floridae Inhibin beta XP002606835.1 Patella vulgata Activin/Inhibin 0.16 Saccoglossus kowalevskii Activin/Inhibin NP001161496.1 Nematostella vectensis Activin/Inhibin ABF61781.1| 0.36 7 0.17 Brachionus plicatilis Activin/Inhibin/Myostatin-like 2 Branchiostoma floridae Activin/Inhibin XP002606835.1 0.09 Pomatoceros lamarckii Activin/Inhibin 0 60 Saccoglossus kowalevskii Activin/Inhibin NP001161496.1 0.07 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 3 194641 Apis mellifera ALP XP001122210.1 0.04 Nematostella vectensis Activin/Inhibin ABF61781.1 0.04 84 Drosophila melanogaster ALP (Dawdle) NP722840.1 Lottia gigantea Activin/Inhibin 151945 1 Schistosoma mansoni TGFb XP002575902.1 0.92 Patella vulgata Activin/Inhibin Drosophila melanogaster Myostatin AAF59319.1 Lottia gigantea Unknown Activin/Inhibin/Myostatin-like C 238821 18 29 Pomatoceros lamarckii Myostatin A Capitella teleta Myostatin 39276 Pomatoceros lamarckii Myostatin B 0.78Mus musculus Myostatin AAO46885.1 40 1 Nematostella vectensis Myostatin XP001641598.1 0.43Mus musculus GDF11 NP034402.1 0 Branchiostoma floridae Myostatin XP002599461.1 99 Lottia gigantea Myostatin 82990 0.43 19 0.03 Pomatoceros lamarckii Myostatin A Patella vulgata Myostatin 0.65 0.43 Pomatoceros lamarckii Myostatin B Saccoglossus kowalevskii GDF8/11 XP 002734819.1 0.44 Saccoglossus kowalevskii GDF8/11 XP002734819.1 Capitella teleta Myostatin 39276 11 Drosophila melanogaster Myostatin AAF59319.1 Branchiostoma floridae Myostatin XP002599461.1 1 0.65 Lottia gigantea Myostatin 82990 Mus musculus Myostatin AAO46885.1 1 0.02 Patella vulgata Myostatin 96 Mus musculus GDF11 NP034402.1 0 Nematostella vectensis Myostatin XP 001641598.1 Saccoglossus kowalevskii TGFb2 ADB22639.1 0.11 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 2 123463 Mus musculus TGFb1 NP035707.1 98 Ciona intestinalis Lefty NP001071997.1 0.22 0.6 78 Mus musculus TGFb2 NP033393.2 Saccoglossus kowalevskii Lefty NP001164679.1 0.44 34 Mus musculus TGFb3 NP033394.2 1 Branchiostoma floridae Lefty XP002589231.1 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 1 223591 0.14 Mus musculus Lefty 1 NP034224.1 1 Ciona intestinalis Lefty NP001071997.1 Mus musculus Lefty 2 NP796073.1 76 Saccoglossus kowalevskii Lefty NP001164679.1 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 1 223591 0.04 50 Branchiostoma floridae Lefty XP002589231.1 Mus musculus TGFb1 NP035707.1 1 0.96 0 Mus musculus Lefty 1 NP034224.1 0.99Mus musculus TGFb3 NP033394.2 99 Mus musculus Lefty 2 NP796073.1 1 Mus musculus TGFb2 NP033393.2 0.36 Saccoglossus kowalevskii TGFb2 ADB22639.1 2 20 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 2 123463 0.21 Mnemiopsis leidyi TGFbA AEP16383.1 Lottia gigantea Unknown Activin/Inhibin/Myostatin-like A 174837 Brachionus plicatilis Activin/Inhibin/Myostatin-like 1 Mnemiopsis leidyi TGFbB AEP16387.1 0.37 1 Lottia gigantea Unknown Activin/Inhibin/Myostatin-like B 174637 Patella vulgata Unknown Activin/Inhibin/Myostatin-like A 0.95 0.63 Patella vulgata Unknown Activin/Inhibin/Myostatin-like B 20 Lottia gigantea Unknown Activin/Inhibin/Myostatin-like B 174637 26 0.52 Patella vulgata Unknown Activin/Inhibin/Myostatin-like A 78 Patella vulgata Unknown Activin/Inhibin/Myostatin-like B Lottia gigantea Unknown Activin/Inhibin/Myostatin-like A 174837 0.35 Lottia gigantea Unknown Activin/Inhibin/Myostatin-like C 238821 Apis mellifera ALP XP001122210.1 0.98 26 Brachionus plicatilis Activin/Inhibin/Myostatin-like 1 0.32 Drosophila melanogaster ALP (Dawdle) NP722840.1 36 Brachionus plicatilis Activin/Inhibin/Myostatin-like 2 Schistosoma mansoni Activin/Inhibin XP002577586.1 0.77 5 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 3 194641 Schistosoma mansoni TGFb XP002575902.1 0.43 16 Mnemiopsis leidyi TGFbA AEP16383.1 Mnemiopsis leidyi TGFbB AEP16387.1 Schistosoma mansoni Activin/Inhibin XP002577586.1 1 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 4 165201 0.46 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 4 165201 Mus musculus Muellerian-inhibiting factor NP031471.2 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 5 193459 Capitella teleta Unknown Activin/Inhibin/Myostatin-like 5 193459 Mus musculus Muellerian-inhibiting factor NP031471.2 Mus musculus Neurturin NP032764.1 1 Mus musculus Neurturin NP 032764.1 Mus musculus GDNF NP034405.1 1 Xenopus laevis GDNF NP001090196.1 Danio rerio GDNF NP 571807.1 0.64 99 Danio rerio GDNF NP571807.1 Mus musculus GDNF NP 034405.1 90 Xenopus laevis GDNF NP 001090196.1 0.3

0.2

Fig. 3. TGF-b-like class familial phylogenetic interrelationships across the Metazoa, as determined by (A) maximum likelihood (Tamura et al., 2011) and (B) Bayesian (Huelsenbeck and Ronquist 2001) methods. Alignment generated by MAFFT (Katoh and Standley 2013) using the G-iNS-i strategy, resulting in a final 71 amino acid informative alignment. Both phylogenies determined using the WAG mode (Whelan and Goldman 2001). Bootstrap percentage (of 1,000 replicates) and posterior probabilities (after 10,000,000 generations before convergence) can be seen at the nodes of ML and Bayesian trees respectively. Sequences used in phylogenetic analysis, along with alignment, can be found in Supplementary File 1. Phylogenies rooted with known Neuturin and GDNF outgroups. Lophotrochozoan sequences underlined in red. Scale bars represent substitutions per site at given distances. 538 N.J. Kenny et al. two known rounds of lineage-specific whole genome duplication these genes in vivo are as yet uncatalogued, and no homologue in that species (Flot et al., 2013), in contrast to the unknown origin is found in the genome of C. gigas. These sequences therefore of gene duplications in other Lophotrochozoan clades. A. vaga, could represent a gastropod or patellogastropod novelty. In some like B. plicatilis, possesses BMP 3, Nodal and Activin/Inhibin prior analyses (data not shown), platyhelminth sequences form sequences. We have used B. plicatilis sequences to represent a sister group to this clade with poor (< 20 bootstrap support), the Rotifera in our trees as its relatively slow rate of molecular but further evidence from other lophotrochozoans is needed to evolution aided the resolution of other nodes in this figure, but determine whether this is a genuine relationship. have included these three closely-related A. vaga genes to show Outside of these clades, little signal can be recovered to sup- evidence that they cannot be categorised into known clades. We port relationships between a diverse range of other ligands in the find the absence of BMP 2/4 (Dpp) homologues in the Rotifera as Lophotrochozoa. On the basis of the lack of clades forming, even a whole particularly interesting, as these genes play a key role in between such relatively closely related species as C. teleta and establishing dorsoventral polarity in a phylogenetically broad range P. lamarcki, and especially between P. vulgata and L. gigantea, it of species, in concert with Chordin. seems that these ligand sequences are highly variable between It should also be noted that as well as these rotifer sequences, lophotrochozoan species. These sequences are named without several other sequences are not shown in our analysis in Fig. 3, but reference to their evolutionary relationships, and the terms ‘A, B, C’ are provided in Supplementary File 1. The first of these sequences etc in multiple species are used to allow within-species numeration is a P. vulgata Myostatin-like homologue, whose sequence was rather than any inference of orthology. partially recovered from our genomic and transcriptomic data, but Some additional insight could potentially be drawn from the which covers only a portion of the mature ligand sequence and number of cysteine residues found in these sequences, as these was therefore excluded from analysis. The second is a C. teleta residues have a characteristic distribution in some model organ- gene, protein ID 198732, which appears to be a markedly diver- isms. In some vertebrates, TGF-b ligands sensu stricto and Inhibin gent TGF-b class ligand. Its annotation in the C. teleta genome b are said to have nine cysteines, while Inhibin a (along with BMPs suggests that it has been noted as expressed in the EST studies and GDFs in the BMP-like clade) have seven. These pair to form used as the basis for gene prediction, but it appears to have lost four and three disulfide bonds respectively. The remaining cyste- a significant portion of the mature ligand region. This may be the ine residue forms such a bond only when the ligands dimerise to result of pseudogenisation in progress, as without the portions signal. Lefty proteins, as well as GDFs 3, 9 and BMP 15 are said of the ligand domains, it is unlikely the protein it encodes is fully to have only six cysteines - they do not bond covalently to form functional in the same manner as canonical TGF-b class ligands. dimers (Derynck and Miyazono 2008, Moustakas and Heldin 2009). The oyster C. gigas also shows evidence of diversification in TGF-b By our count, in the M. musculus and deuterostome sequences class ligands (data not shown here, see Fleury et al., 2008), al- used in our analysis (Supplementary File 1), Lefty and Muellerian- though when these sequences are added to our analysis no further inhibiting factor proteins possess seven cysteines (lacking the fifth structure was added to our tree - these seem to be fast-evolving, and second respectively when counting from the N terminus of the highly derived sequences, with uncertain homology to the other mature ligand), while all other deuterostome ligands in our dataset lophotrochozoan data presented in this paper. possess at least eight cysteine residues. This may represent the ALP seems to be an insect innovation, forming a clade with clear ancestral condition, with the more derived form studied in detail support and no orthologue seen in any lophotrochozoan or deu- by those papers referenced above. These cysteine positions can terostome species. Muellerian inhibiting factor has been suggested be clearly seen in the BMP alignment file in Supplementary File to be a deuterostome or even vertebrate-specific ligand, but weak 1, in alignment positions 1, 2, 28. 32, 57, 58, 85, and 87 when support groups the C. teleta Unknown Activin/Inhibin/Myostatin-like present. For the TGF-b class alignment, these are positions 1, 2, 4 with this sequence in the Bayesian analysis. Whether this is a 27, 31, 42, 43, 69, and 71. true relationship, or instead is a result of long-branch attraction is The majority of our Lophotrochozoan ligand sequences possess uncertain, but could be tested with increased genomic sequencing eight cysteine residues. Of the uncategorised sequences seen in across the Lophotrochozoa. Fig. 3, the four L. gigantea and P. vulgata sequences mentioned as As with the BMP-like ligands (shown in Fig. 2), TGF-b-like forming a weakly supported clade earlier have 7 cysteines (lacking ligands seem to have been lost from both rotifer species consid- the fifth as counted from the N terminus), as doC. teleta Unknown ered and S. mansoni, with those that are found not falling into Activin/Inhibin/Myostatin-like 4 and C. teleta Unknown Activin/In- identifiable clades. This echoes the findings of previous studies in hibin/Myostatin-like 1, which lack the second from the N terminus. schistosomes (Freitas et al., 2007). It has been suggested that S. One sequence, C. teleta Unknown Activin/Inhibin/Myostatin-like 5, mansoni could use host ligands as part of its signalling cassette has only six cysteines, lacking both the second and the fifth. When (Osman et al., 2006), which would explain low diversity of these cysteines are lost from ligands in vertebrates, they are also lost ligands in this species. The lack of rotifer TGF-b ligands is more from the second and/or fifth position, which further reinforces that unusual, and future sequencing efforts in the Rotifera will reveal some cysteines are vital for maintaining the “cysteine knot” form whether this loss is real, or an artefact of insufficient sequencing of the active ligand, while others can be lost. depth. It should be noted that the rotifer receptor complement is Unfortunately, given the uncertainty regarding the number of also slightly modified, and could reflect changes to ligand sequence cysteines found in canonical groups, this character cannot be and structure in concert with downstream aspects of this cascade. used to further classify our ligands. A better understanding of the Some support is found for a mollusc-specific clade of ligands, interrelationships between these ligands is probably only to be encompassing L. gigantea Unknown Activin/Inhibin/Myostatin-like drawn from denser taxon sampling across the Lophotrochozoa. At A and B and similarly named sequences in P. vulgata. The roles of present, the short length of the mature ligand sequence and the The Lophotrochozoan TGF-b signalling cassette 539 lack of conservation of sequence in the longer propeptide means traditional model organisms. In contrast, loss in ligand comple- that phylogenetic inference, by whatever means, is limited by a ments can be seen in the Rotifera and Platyhelminthes. TGF-b lack of information. sensu stricto and Lefty genes appear to be deuterostome novelties, The lack of constraint on the non-cysteine portions of the as no evidence can be found for their presence in the genomes sequences of these ligands is also interesting from a structural here examined. How these changes in ligand diversity have af- perspective. The sequence of BMP-like ligands seems highly fected lophotrochozoan biology, and the degree to which known constrained, probably because of their vital interactions with recep- ligand families possess ancestrally shared roles mirroring those tors and regulators of their activity. That TGF-b class ligands in performed in other Superphyla, will be a topic of broad interest for the Lophotrochozoan clade can diversify so much, even between developmental biologists in the future. closely related species, when the remainder of the core signalling cascade remains relatively stable raises questions about how these Serine/threonine kinase receptors ligands can successfully maintain their secondary and tertiary Each TGF-b ligand pair binds to Type I and II serine/threonine structures in the face of relatively large sequence changes. This is kinase receptors. When TGF-b ligands form a complex with especially puzzling when so many other TGF-b class ligands have, representatives of both types of receptor, the intracellular kinase presumably under purifying selection, maintained a relatively stable domains of the receptors are brought together and the Type I sequence over evolutionary time across the Metazoa. receptor is phosphorylated and activated (Massagué 1998). In To summarise, of the TGF-b superclass, TGF-b-like ligands humans, a total of seven Type I and five Type II receptors have appear to have diversified greatly in lophotrochozoans, at least been described, while Tribolium castaneum, Apis mellifera and D. in the lineage leading to molluscs and annelids (Fig. 1B), while melanogaster possess a total of five – three Type I, and 2 Type II the BMP class ligand complement is similar to that seen in more (van der Zee et al., 2008).

57 Capitella telata BMP Receptor 1 111904 A 24 Pomatoceros lamarckii BMP Receptor 1 B Crassostrea gigas TGF beta Receptor 1 CAD66433.1 55 Crassostrea gigas BMP Receptor 1 CAE11917.1 Capitella teleta BMP Receptor 1 111904 0.72 24 Patella vulgata BMP Receptor 1 0.82 Pomatoceros lamarckii BMP Receptor 1 91 Lottia gigantea BMP Receptor 1 108334 1 Crassostrea gigas BMP Receptor 1 CAE11917.1 10 Patella vulgata BMP Receptor 1 Branchiostoma floridae BMP Receptor 1 XP002596004.1 1 Apis mellifera BMP Receptor 1B XP391989.2 0.94 Lottia gigantea BMP Receptor 1 108334 35 85 Drosophila melanogaster Thickveins AAN10533.1 Branchiostoma floridae BMP Receptor 1 XP002596004.1 0.59 Mus musculus BMP Type 1A ALK-3 P36895.1 Mus musculus BMP Type 1A (BMP2/4) ALK-3 P36895.1 1 1 97 Mus musculus BMP Receptor 1B NP031586.1 Mus musculus BMP Receptor 1B NP031586.1 25 Apis mellifera BMP Receptor 1B XP391989.2 81 Mus musculus Activin Receptor 1 NP031420.2 1 39 Mus musculus ALK-1 Q61288.2 1 Drosophila melanogaster Thickveins AAN10533.1 28 0.98 Mus musculus Activin Receptor 1 NP031420.2 Drosophila melanogaster Saxophone AAF59189.1 1 Branchiostoma floridae Activin Receptor Type 1 XP002598018.1 0.46 Mus musculus ALK-1 Q61288.2 76 Pomatoceros lamarckii Activin Receptor 1 Branchiostoma floridae Activin Receptor Type 1 XP002598018.1 Pomatoceros lamarckii Activin Receptor 1 44 Capitella telata Activin Receptor 1 109715 1 33 0.84 Capitella teleta Activin Receptor 1 109715 Crassostrea gigas Activin Receptor 1 CAC85263.1 54 0.22 0.76 Crassostrea gigas Activin Receptor 1 CAC85263.1 85 Patella vulgata Activin Receptor 1 0.75 1 Patella vulgata Activin Receptor 1 99 Lottia gigantea Activin Receptor 1 53518 1 Lottia gigantea Activin Receptor 1 53518 85 Patella vulgata TGF beta Receptor 1 Drosophila melanogaster Saxophone AAF59189.1 Lottia gigantea TGF beta Receptor 1 180706 Tribolium castaneum Baboon EFA01312.1 96 Adenita vaga TGF beta Receptor Type I c and d GSADVT00049047001 and GSADVT00062904001 0.48 1 Drosophila melanogaster Baboon AAF59011.1 70 Adenita vaga TGF beta Receptor Type I a and b GSADVT00017122001 and GSADVT00051281001 13 Patella vulgata TGF beta Receptor 1 Brachionus plicatilis TGF beta Receptor Type I 0.97 Lottia gigantea TGF beta Receptor 1 180706 95 92 Tribolium castaneum Baboon EFA01312.1 A. vaga TGF beta Type I c and d GSADVT00049047001/GSADVT00062904001 0 Drosophila melanogaster Baboon AAF59011.1 1 0.31 0.9 A. vaga TGF beta Type I a and b GSADVT00017122001/GSADVT00051281001 17 Crassostrea gigas TGF beta Receptor 1 CAD66433.1 0.67 Brachionus plicatilis TGF beta Receptor Type I 13 21 Pomatoceros lamarckii TGF beta Receptor 1 Brachionus plicatilis BMP Receptor Type I Capitella telata TGF beta Receptor 1 227433 0.31 Capitella teleta TGF beta Receptor 1 227433 0.58 12 Mus musculus TGF beta Receptor 1 BAA05023.1 Pomatoceros lamarckii TGF beta Receptor 1 49 Mus musculus Acvr1c ALK-7 AAI45225.1 Mus musculus TGF beta Receptor 1 BAA05023.1 0.4 0.72 Mus musculus Activin Receptor 1B NP 031421.1 14 0.52 Mus musculus Activin Receptor 1B NP 031421.1 18 Branchiostoma floridae TGF Beta Receptor Type 1 XP002598020.1 0.83 Branchiostoma floridae TGF beta Receptor Type 1 XP002598020.1 Brachionus plicatilis BMP Receptor Type I Mus musculus Acvr1c ALK-7 AAI45225.1 72 Mus musculus Activin Receptor 2A CAM14875.1 Branchiostoma floridae Activin Receptor Type 2 XP002598117.1 69 Mus musculus Activin Receptor 2B NP 031423.1 Mus musculus Activin Receptor 2B NP 031423.1 1 0.97 10 Branchiostoma floridae Activin Receptor Type 2 XP002598117.1 Mus musculus Activin Receptor 2A CAM14875.1

Capitella telata Activin Receptor 2 94926 0.89 Capitella telata Activin Receptor 2 94926 Crassostrea gigas Activin Receptor 2 CAR92545.1 21 Crassostrea gigas Activin Receptor 2 CAR92545.1 0.89 58 Patella vulgata Activin Receptor 2 0.92 Lottia gigantea Activin-like Receptor 2 123928 1 95 Lottia gigantea Activin-like Receptor 2 123928 0.21 Patella vulgata Activin Receptor 2 92 0.88 Apis mellifera Activin Receptor 2B XP395928.4 0.49 Pomatoceros lamarckii Activin Receptor 2 36 67 Drosophila melanogaster Punt AAF55079.1 Drosophila melanogaster Punt AAF55079.1 1 Pomatoceros lamarckii Activin Receptor 2 Apis mellifera Activin Receptor 2B XP395928.4 Brachionus plicatilis Activin-like Receptor Type II Adinita vaga AR2 c and d GSADVT00027945001/00011539001 1 0.96 93 Mus musculus TGF beta Receptor 2 NP 083851.3 0.67 Adenita vaga AR2 a and b GSADVT00027945001/00011539001 34 Branchiostoma floridae TGF Beta Receptor Type 2 XP002612985.1 Brachionus plicatilis Activin-like Receptor Type 2 Branchiostoma floridae TGF Beta Receptor Type 2 XP002612985.1 Adenita vaga Activin Receptor Type 2 a and b GSADVT00034553001 and GSADVT00051624001 1 75 Adenita vaga Activin Receptor Type 2 c and d GSADVT00027945001 and GSADVT00011539001 Mus musculus TGF beta Receptor 2 NP 083851.3 0.39 13 Mus musculus BMP Receptor 2 NP 031587.1 Mus musculus BMP Receptor 2 NP 031587.1 Apis mellifera Wishful Thinking XP397334.3 70 Capitella telata BMP Receptor 2 117843 1 99 Pomatoceros lamarckii BMP Receptor 2 Drosophila melanogaster Wishful Thinking AAF47832.1 0.89 0.61 Crassostrea gigas BMP Receptor 2 CAD20574.1 21 Apis mellifera Wishful Thinking XP397334.3 55 0.91 Lottia gigantea BMP receptor 2 172579 Drosophila melanogaster Wishful Thinking AAF47832.1 0.69 1 Patella vulgata BMP Receptor 2 17 Crassostrea gigas BMP Receptor 2 CAD20574.1 Pomatoceros lamarckii BMP Receptor 2 46 Patella vulgata BMP Receptor 2 1 Capitella teleta BMP Receptor 2 117843 99 Lottia gigantea BMP receptor 2 172579

0.2 0.2

Fig. 4. TGF-b and BMP receptor molecule interrelationships across the Metazoa, as determined by (A) maximum likelihood (Tamura et al., 2011) and (B) Bayesian (Huelsenbeck and Ronquist 2001) methods, and rooted at the midpoint. Predominantly TGF-b-like and BMP-like cascade receptors shown in green and blue respectively. Alignment generated by MAFFT (Katoh and Standley 2013) using the G-iNS-i strategy resulting in a 136 informative amino acid alignment spanning the protein kinase domain (PFAM PF00069). Both phylogenies determined using the WAG model (ML +4G) (Whelan and Goldman 2001). Bootstrap percentage (of 1,000 replicates) and posterior probabilities can be seen at the nodes of ML and Bayesian trees respectively. Lophotrochozoan sequences underlined in red. Sequences used in phylogenetic analysis, along with alignment, can be found in Supplementary File 1. Scale bars represent substitutions per site at given distances. 540 N.J. Kenny et al.

The serine/threonine kinase receptor complements of the Lo- (data not shown), although this gene is grouped with A. vaga and photrochozoa have previously been the subject of investigation, B. plicatilis TGF-b R1 genes in the Bayesian tree shown in Fig. 4. with those of C. gigas already described (Herpin 2005). Coupled This gene could therefore represent a diverged copy of a rotifer with our extensive knowledge of the diversity of these molecules in TGF-b R1 gene after duplication in that lineage, or alternately rotifer the freshwater Ephydatia fluviatilis (Suga et al., 1999) and receptors could be becoming more similar through gene conversion. other non-bilaterian metazoans (Humineicki et al., 2009, Pang et The divergent nature of the TGF-b ligands found in these species al., 2011) these are perhaps the best-catalogued components of may suggest that their receptor sequences are evolving to signal the TGF-b cascade. With some exceptions, the serine/threonine in a derived fashion, although the true causes of these changes kinase receptor complement varies little across the Metazoa. requires further research. Generally three Type I and two Type II receptors are found in any Schistosome receptor complements have been studied else- species, with TGF-b-like signalling occurring through a set pair where in depth (Davies et al., 1998, Forrester et al., 2004), and of dimerised Type I and Type II receptors (TGF-b R1 and Act R2, in many ways their complements represent a surprising finding, also known by a diverse range of other names), while two Type I given the presence in S. mansoni of only two TGF-b-like ligands. receptors (BMP R1 and the misleadingly named Act R1) can each These complements do not map exactly onto the canonical cas- be found in complex with BMP R2. sette, but, as hypothesised in Osman et al., (2006) and earlier in In vertebrates, considerably more diversity of receptor number this manuscript, their quantity, when compared to the few ligands exists, most likely due to the 2R whole genome duplications. Larger encoded in its genome, may suggest that these molecules respond numbers of receptor also exist in invertebrate deuterostomes and to host, rather than endogenous, signalling cues. the cnidarian Nematostella vectensis, most likely due to independent In short, the relative conservation of serine/threonine kinase duplications in these lineages, some of which have been traced receptor sequences within annelids and molluscs confirms the to specific nodes on the deuterostome tree of life (Humineicki et suggestion of Herpin et al., (2005) with regard to the broad con- al., 2009). servation of serine/threonine kinase receptor diversity across While the canonical five serine/threonine kinase receptors are metazoan evolution. found in all annelid and mollusc species examined (Fig. 4), these have diversified in the . BothA. vaga and B. plicatilis possess Smad proteins at least one Activin R2 and TGF-b R1 gene, although the four A. Smad proteins play a key role in transducing extracellular signals vaga Activin R2 genes are drawn toward the base of our ML tree into an intracellular response. Of all the parts of the TGF-b signal- by long branch attraction. One B. plicatilis sequence appears to be ling cascade investigated in the present study, it is these molecules a divergent BMP R1 by Blast identity and in some of our ML trees that show the least amount of loss and disparity in number across

99 Lottia gigantea SMAD1/5/8 104873 A B BrachionuLottias gigplicaanttilieas SMAD21//35/8 104873 20 Patella vulgata SMAD1/5/8 1 Pomatoceros lamarckii SMAD1/5/8 LoPattteiall agig vualngtaeata SMAD1AD1/5/8 104873 22 PaPteollmaa vtuolcergatao sSM lamarckAD1/5/ii8 SMAD1/5/8 11 Capitella teleta SMAD1/5/8 173019 1 0.9 Schistosoma mansoni SMAD1/5/8 AAV80239.1 Mus musculus SMAD1 NP032565.2 0.78P omatoceros lamarckii SMAD1/5/8 CapitellaS ctehliesttao sSMomaAD ma1/5n/8s 17301oni SM9AD1/5/8 AAV80239.1 10 Mus musculus SMAD5 NP032567.1 82 CMaupist emlluasc teululetas SMSMAD11/ 5N/8P 032565173019.2 Mus musculus SMAD8 AF175408 1 0.84 MDruso msophilusculua ms eSMlanogADas5 NtePr032567 MAD AA.1 F51142.1 90 Schistosoma mansoni SMAD1/5/8 AAV80239.1 0.99 Fig. 5. Smad and Dad interrela- 32 MTruibolius mumsc casulust aSMneADum8 M AADF175408_ EFA056631 .1 Brachionus plicatilis SMAD1/5/8 1 0.72 NematoBrstacellahionu vectse plinscais tSMilisAD SM1AD/5/81 /ABC5/8 88374.1 tionships across the Metazoa, Nematostella vectensis SMAD1/5/8 ABC88374.1 NematoBrstacellahionu vectse plinscais tSMilisAD SM1AD/5/81 /ABC5/8 88374.1 43 Drosophila melanogaster MAD AAF51142.1 0.98 as determined by (A) maximum 73 83 MDruso msophilusculua ms eSMlanogAD1as NtPer032565 MAD AA.2 F51142.1 Tribolium castaneum MAD EFA05663.1 0.88 0.98 MTursiboliu muscmulu cass SMtanADeum5 NMPAD032567 EFA05663.1 .1 likelihood (Tamura et al., 2011) 35 Schistosoma mansoni SMAD2/3a XP002579703.1 Apis mMeullisf emrau scSmuluoxs XPSM396056AD8 AF.4175408_1 0.83 Schistosoma mansoni SMAD2/3b XP002577809.1 0.88 DrApios ophilmelliafe mrae lSamnogoxas XPte396056r Smox. 4NP511079.1 and (B) Bayesian (Huelsenbeck 9 Nematostella vectensis SMAD2/3 XP001631657.1 DrNeomsophilatostae llmae veclanogtenassitse rSM SmADox2 /N3P XP511079001631657.1 .1 and Ronquist 2001) methods. 91 Apis mellifera Smox XP396056.4 MNuesm mautscosulutellsa SMvecADten2s NisP SM034884AD2/.32 XP001631657.1 20 0.99 Drosophila melanogaster Smox NP511079.1 Muss muscsculuss SMAD23 NP034884058049..23 Alignment generated by MAFFT 93 23 Mus musculus SMAD2 NP034884.2 MuSsc mhiussctosuluomas SM maADns3on NiP SM058049AD2/.3a3 XP002579703.1 72 0.91 0.8 (Katoh and Standley 2013) using Mus musculus SMAD3 NP058049.3 SchSiscthoisotomaso mama nmasonnsi onSMi AD2SMAD2/3a /XP3b 002579703XP002577809.1 .1 Pomatoceros lamarckii SMAD2/3 28 94 Lottia gigantea SMAD2/3 189014 Schistosoma mansoni SMAD2/3b XP002577809.1 the G-iNS-i strategy, with the sec- 1 Patella vulgata SMAD2/3 CPaopimatelltoacer teleotsa lSMamarckAD2/ii3 SM16786AD23/3 96 39 tion used for analysis a 139 infor- Capitella teleta SMAD2/3 167863 1 PCaatepilltae vllau ltgealetata SM SMAD2AD2/3/3 167863 Pomatoceros lamarckii SMAD2/3 PLoattteillaa gig vualgnateata SMAD2AD2//33 189014 mative amino acid region spanning Brachionus plicatilis SMAD2/3 Brachionus plicatilis SMAD2/3 Lottia gigantea SMAD2/3 189014 Brachionus plicatilisBr SMacADhionu4 s plicatilis SMAD6/7 the MH2 domain (Pfam PF03166). Brachionus plicatilis SMAD4 0.67 Schistosoma mansSonchi iSMstoAD4soma XP ma002574840nsoni SM.AD61 /7 XP002569925.1 99 Schistosoma mansoni SMAD4 XP002574840.1 Both phylogenies determined us- 1 PomatocerDroso lamarcksophila iim SMelaADnog4aster DAD NP477260.1 Pomatoceros lamarckii SMAD4 38 Lottia giganNteaem SMatoADste4ll a13107 vecte8nsis SMAD6/7 XP001631691.1 ing the WAG mode (Whelan and 99 Lottia gigantea SMAD4 131078 1 20 Patella vulgMatuas SM muADsc4ulus SMAD7 NP001036125.1 Patella vulgata SMAD4 0.86 Goldman 2001), rooted with the 50 0.54 Capitella teleta ASMpisAD m4e lli17936fera DAD8 XP396816.3 Capitella teleta SMAD4 179368 13 Mus musculuMuss SMmuADsculu4 NsP SM032566AD6 .N2 P032568.3 known Smad 6/7 clade (ML) and at Mus musculus SMAD4 NP032566.2 1 0.72 Nematostella Lovectttiae ngigsisa nSMteaAD SM4 ADEDO6/313827 14551.1 6 41 midpoint (Bayesian). Bootstrap per- Nematostella vectensis SMAD4 EDO31382.1 0.77 Apis melliferPaa Mteelldaea vu (SMlgatADa SM4)AD6 XP392838/7 .4 37 1 Apis mellifera Medea (SMAD4) XP392838.4 Drosophila mePlaonogmaastocerter oMse ldamarckea (SMiiAD SM4)AD6 AAC/738971.1 centage (of 1,000 replicates) and 87 Drosophila melanogaster Medea (SMAD4) AAC38971.1 0.89 CapitellBra tacelehionuta SMsAD plica6/7t ili14708s SMAD3 6/7 54 Brachionus plicatilis SMAD6/7 0.88 Brachionus pliScachtiilistso SMsomaAD ma4 nsoni SMAD6/7 XP002569925.1 posterior probabilities can be seen Schistosoma mansoni SMAD6/7 XP002569925.1 1 SchistosoDrmao smaophilnsona mi SMelaAD4nogas XPte002574840r DAD NP477260.1 .1 at the nodes of ML and Bayesian 96 Drosophila melanogaster DAD NP477260.1 PomatoceroNs elamarckmatosteiill SMa vecADt4ensis SMAD6/7 XP001631691.1 1 34 Nematostella vectensis SMAD6/7 XP001631691.1 Lottia giganMteaus SM muADsc4ulu 13107s SM8AD7 NP001036125.1 trees respectively. Lophotrocho- 1 82 Apis mellifera DAD XP396816.3 Patella vulgataA SMpisAD m4ellifera DAD XP396816.3 zoan sequences underlined in red. 67 Mus musculus SMAD7 NP 001036125.1 C0.54api tella telMetuas SM muADsculu4 17936s SMAD8 6 NP032568.3 34 Mus musculus SMAD6 NP032568.3 M0.65us musc1ulu sLo SMttiADa gig4 NanPt032566ea SMAD.2 6/7 145516 Sequences used in phylogenetic Nematostella Pvecatetllean sviusl gSMataAD SM4 AD6EDO/313827 .1 26 97 Lottia gigantea SMAD6/7 145516 0.9 analysis, along with alignment, can Apis me1lli fera PMoemadeato (SMceroADs lamarck4) XP392838ii SMAD6.4 /7 44 Patella vulgata SMAD6/7 Drosophila melanogCapiastetellra Mteeledteaa SM (SMADAD6/47) 14708AAC389713 .1 be found in Supplementary File 1. 62 Pomatoceros lamarckii SMAD6/7 Capitella teleta SMAD6/7 147083 0.3 Scale bars represent substitutions

0.2 per site at given distances. The Lophotrochozoan TGF-b signalling cassette 541 the Metazoa, presumably due to the pleiotropic effects that would cascades. This class has diverged into a large and confusingly happen if loss were to occur. Smad molecules respond to a di- named clade of genes, especially in vertebrates, where the 2R verse range of incoming signals, as can be seen in Fig. 1A, and whole genome duplication event likely allowed sub- and neo- even if loss of one ligand occurs in a lineage, a Smad may still functionalisation to occur. The results of phylogenetic analyses be responsible for passing on the message brought by another of members of this gene class from species across the Metazoa ligand. Such pleiotropy means that loss may be less tolerated by (Fig. 6) reveal how this difficult-to-catalogue group has evolved. natural selection at this step of the signalling cascade than others. It appears that the cassette of DAN-class members found in All lophotrochozoan Phyla studied in the present investigation the common ancestor of deuterostomes and protostomes may were found to contain at least one of each of the four major families have resembled that of N. vectensis, with two homologues giving of Smad molecule (Fig. 5). Interestingly, the rotifer A. vaga (but rise to the diversity we see today. It is possible that the Cerberus/ not B. plicatilis) has lost Smad 6/7 (Dad) and thus is unlikely to Dante family represents a deuterostome innovation - despite the inhibit Smad signalling using this mechanism. The short branch placement of N. vectensis Cerberus/Dante-like (ABF06563.1) at lengths generally found outside the inhibitory Smad (Smad6/7 or the base of this clade. Given the widespread presence of Gremlins Dad) clade also point to generally constrained selection on these across the Metazoa, and previous study in this group in Cnidarians molecules. Both B. plicatilis and S. mansoni show longer branch (Rentzsch et al., 2006), it perhaps would be parsimonious to infer lengths for these sequences, however, which may be a result of that it is in fact a Gremlin, rather than inferring loss of a cnidarian co-evolution to interact with the divergent receptor cassettes also Gremlin and Cerberus/Dante-like factors. We cannot seen in these Phyla. distinguish between these alternatives definitively with the data S. mansoni shows evidence of a lineage-specific Smad2/3 du- available, but this hypothesis could be easily tested with the advent plication, which is perhaps surprising in light of the reduced ligand of broader sequence availability. complement of this species, and may represent a subfunctionali- No wider DAN class genes can be found in schistosomes or in sation of roles previously performed by a single ancestral gene. the rotifers A. vaga or B. plicatilis, and loss of portions of this class We note that we do not recover a monophyletic clade of Smad are prevalent in other species - D. melanogaster, for example, has 2/3 sequences in our Bayesian analysis, which is largely due to no DAN class genes in its genome (van der Zee et al., 2008), and the small differences in sequence between the R-Smad clades. DANs sensu stricto have been lost across the Lophotrochozoa (Table 1). DANs sequester a variety of ligands, with specific- Dan/Cerl/Coco/Prdc/Cerberus/Gremlin ity varying depending on the DAN protein examined. Some are Members of the wider DAN-like gene class sequester ligands, specific to the BMP-like signalling pathway, while others (such as preventing them from binding to receptors and activating signalling Cerberus) can inhibit Nodal in the TGF-like pathway. Gremlin is

1 / 96 Homo sapiens Prdc NP071914.3 1 / 77 Mus musculus Prdc NP035955.1 0.99 / 90 Gallus gallus Prdc XP419552.3

0.98 / 63 Danio rerio Prdc NP001017704.1 Danio rerio Gremlin NP998017.1 Gallus gallus Gremlin NP990309.1 Fig. 6. DAN class interrelation- 1 / 65 1 / 85 ships across the Metazoa, as de- 0.89 / 96 Mus musculus Gremlin NP035954.1 0.97 / 81 termined by maximum likelihood 0.87 / 36 Homo sapiens Gremlin NP037504.1 (Tamura et al., 2011) and Bayesian Branchiostoma floridae Gremlin XP002604466.1 Culex quinquefasciatus Gremlin XP001842189.1 (Huelsenbeck and Ronquist 2001) 0.99* / 23 95 methods. Alignment generated by 1 / Tribolium castaneum Gremlin EFA02797.1 Saccoglossus kowalevskii Gremlin NP001161561.1 0.99* / 65 MAFFT (Katoh and Standley 2013) * / using the G-iNS-i strategy. Phylog- 10 Lottia gigantea Gremlin 88176 0.95* / 53 1 / 99 Patella vulgata Gremlin enies calculated on the basis of an Gremlins Pomatoceros lamarckii Gremlin Prdc also known as Gremlin-2 87 informative amino acid alignment 1 / 83 Capitella teleta Gremlin 57284/205003 spanning the DAN domain (Pfam ID Helobdella robusta Gremlin AEL12445.1 PF03045). Phylogeny shown is the ^ Nematostella vectensis Dan XP001641376.1 result of ML analysis, with differences Saccoglossus kowalevskii Dan NP001158417.1 in topology using Bayesian methods 0.97 / 65 1 / 78 Homo sapiens Dan NP001191015.1 Dans indicated with a dotted line. Both phy- 1 / * 0.56 / 65 neuroblastoma, suppression Danio rerio Dan NP996980.1 of tumorigenicity 1 logenies determined using the WAG * / 23 Apis mellifera Dan XP001121480.1 mode (Whelan and Goldman 2001) 1 / 94 Tribolium castaneum Dan XP972176.1 and rooted at midpoint. Posterior 0.8 / 42 Nematostella vectensis Cerberus/Dante-like ABF06563.1 probabilities/bootstrap percentage Saccoglossus kowalevskii Cerberus/Dante-like NP001164702.1 (of 1,000 replicates) and can be seen Branchiostoma floridae Cerberus/Dante-like ACF94996.1 Cerberuses 0.97 / 24 at the base of nodes. Lophotrocho- 1 / 96 Homo sapiens Dante NP689867.1 and Dantes zoan sequences underlined in red. 0.64 / 25 Mus musculus Dante NP957679.1 Other names: Caronte/Cer1 (Cerberus) and Coco/Grem3/Sp1/Cer2/DAN Sequences used in phylogenetic 1 / 76 Gallus gallus Cerberus NP990154.1 domain family member 5 (Dante) analysis, along with alignment, can be 1 / 98 Homo sapiens Cerberus NP005445.1 found in Supplementary File 1. Scale 1 / 91 Mus musculus Cerberus NP034017.1 bars represent substitutions per site ^ In Bayesian tree, Dans are sister group to (Gremlins + Cerberuses and Dantes) at given distances. 0.2 with posterior probabilities as shown. 542 N.J. Kenny et al. known to be active in H. robusta, targeting BMP 2/4 preferentially signalling, by freeing ligands from Chordin after the Chordin-ligand (Kuo and Weisblat 2011). Much investigation is required before complex has been cleaved by Tolloid (Oelgeschläger 2000). This inference can be made as to the wider roles of genes that remain could also occur in some of the lophotrochozoan clades presented in the Lophotrochozoa, given the wide range of actions that these here, although annelids lack canonical Chordin homologues while genes are known to have in more established model organisms. possessing Tsg. This mooted role could therefore be absent in this , although Chordin-like may be targeted by Tsg instead. Chordin The phylogenetic relationships of a number of metazoan Tsg Chordin is a BMP regulatory molecule, which sequesters ligands sequences can be seen in Fig. 7B. Mollusc and annelid species all by binding to them. It is best known for its role antagonising BMP possess a single Tsg homologue, while all schistosomes examined 2/4 (also known as Dpp) in dorsoventral patterning. A similar mol- and the rotifers B. plicatilis and A. vaga appear to have lost theirs, ecule, Chordin-like, has also been identified, initially as a BMP 4 as none can be found in their genomes or transcriptomes. This antagonist in the chick (Sakuta et al., 2001). could be correlated to the markedly reduced BMP complements Searches through the lophotrochozoan genomes examined in of these species – without the ligand diversity found in other spe- the present project suggest that both Chordin and Chordin-like cies, it is perhaps unsurprising that regulatory mechanisms have genes are present within the Lophotrochozoa, as can be seen in also been lost. Fig. 7A. Chordin itself seems to be lost across the Annelida. No Chordin or Chordin-like sequences were found in the rotifers A. Noggin vaga and B. plicatilis or in the Platyhelminthes, although other more Noggins are proteins that act to interfere with canonical TGF-b derived families related to Chordin (eg CRIM) were not examined signalling by sequestering ligands before they can bind to their in the above analysis. The presence of clear Chordin-like genes receptors, as with Chordin above. Noggins primarily have been in the Mollusca confirms the hypothesis that Chordin-like genes shown to interact with BMP-like ligands. Our data (Fig. 7C) confirms were present in the Bilaterian common ancestor, rather than being the study of Molina et al., (2011), who posited the existence of two a deuterostome novelty, and is corroborated by the assignment major clades of Noggin across the Metazoa, a canonical Noggin of T. adhaerens and Amphimedon queenslandica Chordin-like clade, and a less well-categorised Noggin-like clade. We find both homologues within this family by other authors, rather than as a kinds in all lophotrochozoans examined with the exception of S. canonical Chordin (sensu Richards and Degnan 2009). mansoni, which has two Noggin-likes but no Noggin, and the roti- fers A. vaga and B. plicatilis, which lack Noggins entirely. It is well Twisted gastrulation (Tsg) documented that canonical Noggins and Noggin-like proteins are Tsg is a modulator of BMP signalling, although its mode of ac- absent from the genomes of some ecdysozoan model organisms tion is yet to be fully understood. As well as binding BMPs, it has (van der Zee et al., 2008) although these are found in some other been suggested that Tsg might also act as a promoter of BMP arthropods (Duncan et al., 2013): the presence of a Noggin in the

TABLE 1

e e

k

i k

s l

n i r i

- l s s t -

o s s

n n n

t I a i i F i n n s s d d t l i i

i p O d d B e R d s n g g r r s t e i s o m a l l a U M M g g o o n n c l l g e g r a a A O o o e h h M m o s o i G D D B N S F T T N L N S R C C References

s Homo sapiens 33 8 13 1 2 1 0 1 2 1 2 1 3 1 3 2 Huminiecki et al 2009 e

m 10 5 7 1 1 1 0 0 1 0 0 1 1 0 1 1 o Ciona intestinalis Hino et al 2003/Huminiecki t

s et al 2009 o r e

t Strongylocentrotus 14 4 6 1 1 0 1 1 1 1 0 1 3 0 2 1 Lapraz et al 2006

u purpuratus e D

s Drosophila melanogaster 7 4 5 1 0 0 0 1 0 0 0 3 2 0 1 1 van der Zee et al 2008 n a

o Apis mellifera 7 4 5 1 0 0 0 0 0 1 0 1 1 1 1 1 van der Zee et al 2008 z o

s Tribolium castaneum 8 4 5 1 0 0 0 1 1 1 0 1 1 1 1 1 van der Zee et al 2008 y

d Caenorhabditis elegans 5 7 3 0 0 0 0 0 1 0 0 0 1 0 1 0 Savage-Dunn 2005, c

E Huminiecki et al 2009 Capitella teleta 16 4 5 0 1 1 1 1 1 0 0 1 2 1 1 1

s

n Pomatoceros lamarckii 9 4 5 0 0 1 1 1 1 0 0 1 1 1 1 1 a

z o Lottia gigantea 10 4 5 1 1 1 1 1 1 0 0 1 1 1 1 1 o

h Patella vulgata 12 4 5 1 0 1 1 1 1 0 0 1 1 1 1 1 c o r

t Brachionus plicatilis 4 4 3 0 0 0 0 0 0 0 0 0 0 0 1 0 o h

p Adenita vaga 10* 10 8 0 0 0 0 2 0 0 0 0 0 0 2 0 Flot et al 2013 *NB see o Supplementary File 1 L Schistosoma mansoni 2 5 5 0 0 1 2 1 0 0 0 0 1 0 1 0

s

t Nematostella vectensis 6 4 6 1 0 1 1 1 1 1 0 0 1 0 1 0 Huminiecki et al 2009, Saina s a

l and Technau 2009 b o

l Mnemiopsis leidyi 9 5 4 0 0 0 0 0 0 0 0 0 1 0 1 1 Pang et al 2011 p i

D Trichoplax adherens 5 4 4 0 1 1 0 1 0 1 0 0 0 0 1 1 Huminiecki et al 2009

TGF-b signalling complements of a range of metazoan species, as determined in the present manuscript or from previously published work as cited at right, where well annotated examples of these complements have been determined from other studies. While no whole-genome analysis of this pathway has yet been performed in the Porifera, we refer the interested reader to Suga et al., 1999, Adamska et al., 2007, and Fig. 11 of Pang et al., 2011, where a number of the key families listed above are described in this Phylum. The Lophotrochozoan TGF-b signalling cassette 543 Daphnia pulex and Noggin and Noggin-like in the he- Follistatin sequences could be identified within our B. plicatilis mipterans Acyrthosiphon pisum and Rhodnius prolixus implies that dataset, although two are present in A. vaga (CAWI010044267.1 this loss happened independently in some insects and nematodes. and CAWI010043776.1). The presence of Noggin-like sequences in schistosomes again Only Follistatin homologues with the classical Follistatin/ raises questions as to their function – they could interact with the Osteonectin-like EGF domain followed by three Kazal-type serine TGF-like ligands found in these species, unlike their role in BMP protease inhibitor domains were examined by the present study, regulation in other Superphyla, or instead they may be involved in with the exception of the N. vectensis homologue, which lacks a regulating exogenous signals or other processes entirely. We do clearly identifiable EGF domain. Extensive diversification of this not detect the diversity of Noggin sequences in other lophotrocho- gene has taken place within the deuterostome lineage as early zoans that are found in S. mediterreanea, where eight Noggin and as the /chordate split, with loss, rearrangement and Noggin-like sequences are found (Molina et al., 2011). We therefore gain of various domains resulting in a total of up to five follistatin- posit that this expansion is lineage specific, and may be related to domain containing genes, descended from an ancestral Follistatin species-specific roles for ligands, perhaps in regeneration. sequence. The nature, role and diversification of these are yet to be fully investigated. Follistatin Follistatin binds to and inhibits TGF-b 1 / 99 Drosophila melanogaster Short Gastrulation NP476736.1 ligands, particularly Activins, although it can A * / 2 8 Anopheles gambiae Short Gastrulation XP310620.4 bind to other ligands, even those in the BMP- * / 4 1 Crassostrea gigas Chordin EKC31975.1 0.52 / * 1 / * like class. Canonical Follistatin sequences are Lottia gigantea Chordin 164016 * / 86 found in the Lophotrochozoa, as can be seen in 1 / 99 Patella vulgata Chordin the phylogenetic tree presented in Fig. 8A. We Nematostella vectensis Chordin ABC88373.1 0.97 / 62 Homo sapiens Chordin AAC69835.1 note that Platyhelminthes possess Follistatin, 1 / 100 0.52 / 54 Danio rerio Chordin NP571048.1 despite not having a canonical Activin ligand 1 / 94 Homo sapiens Chordin-like 1 NP 001137454.1 for it to bind to. As noted earlier, it is suspected Danio rerio Chordin-like protein 2 NP 001124431.1 that parasitic platyhelminths can utilise host 0.99 / 74 Crassostrea gigas Chordin-like EKC43057.1 ligands, so it is possible that these Follistatins 0.97 / 58 Capitella teleta Chordin-like 224618 inhibit the action of these, but it is perhaps more 0.64 / 35 Lottia gigantea Chordin-like 231226 Homo sapiens BMP-binding endothelial regulator NP 597725.1 likely that Follistatins bind the more derived Activin-like ligands found in these species. No 0.2 1 / 93 Homo sapiens Tsg NP065699.1 B 0.92 / 68 Danio rerio Tsg1A NP571872.1 0.84 / 59 Branchiostoma floridae Tsg XP002593913.1 Fig. 7. Chordin (A), Twisted Gastrulation (B) and * / 46 Saccoglossus kowalevskii Tsg NP001158407.1 Noggin (C) interrelationships across the Metazoa, Drosophila melanogaster Crossveinless NP536786.1 as determined by maximum likelihood (Tamura 1 / * 1 / 9 8 Anopheles gambiae Crossveinless XP310820.4 et al., 2011) and Bayesian (Huelsenbeck and 0.94 / 61 Capitella teleta Tsg 165012 Ronquist 2001) methods. Phylogeny shown is the Pomatoceros lamarckii Tsg result of ML analysis, with differences in topology 0.74 / 43 Lottia gigantea Tsg 231460 using Bayesian methods indicated with a dotted 0.99 / 89 Patella vulgata Tsg line. Chordin phylogeny based on a 120 informative Homo sapiens IGFBP1 NP 000587.1 amino acid alignment of the von Willebrand factor type C/D domains and C8 domains generated by 0.2 MAFFT (Katoh and Standley 2013) using the G-iNS-i 1 / 93 Homo sapiens Noggin AAA83259.1 C 0.84 / 81 Xenopus laevis Noggin 1 AAI69672.1 strategy, analysed under the JTT model (Jones et al., Danio rerio Noggin 3 NP571057.1 1 / 96 1992, ML) and Dayhoff model (Dayhoff et al., 1978, 1 / 100 Danio rerio Noggin 1 NP571058.2 Noggin / Noggin A Bayesian) with tree shown rooted with H. sapiens Danio rerio Noggin 2 NP571067.1 Clade BMP-binding endothelial regulator (NP 597725.1). Branchiostoma floridae Noggin ABG66526.1 0.69 / 0.66 / 47 Xenopus tropicalis Noggin 2 NP001007476.1 Tsg phylogeny based on a 146 informative amino 29 acid global alignment generated by MAFFT using the 0.38 / 37 Lottia gigantia Noggin A 127878 1 / 99 Patella vulgata Noggin A G-iNS-i strategy, rooted with H. sapiens IGFBP (NP 0.49 / 60 Capitella teleta Noggin A 155479 000587.1) after Vilmos et al., (2001), with both phy- 0.99 / 60 1 / 93 Pomatoceros lamarckii Noggin A logenies determined using the WAG model (Whelan Daphnia pulex Noggin 55080 and Goldman 2001). Noggin phylogeny based on a 103 Nematostella vectensis Noggin 1 ABF61775.1 informative amino acid global alignment generated by Nematostella vectensis Noggin 2 ABF61776.1 MAFFT using the G-iNS-i strategy rooted at midpoint, Strongylocentrotus purpuratus Noggin XP784090.2 followed by analysis using the WAG model in both 1 / 99 Xenopus tropicalis Noggin 4 NP001037873.1 Takifugu rubripes Noggin 4 AAX07475.1 0.71 / 23 analyses. Posterior probabilities/bootstrap percentage 0.99 / 60 1 / 96 Patella vulgata Noggin B Noggin-like / Noggin B (of 1,000 replicates) and can be seen at the base of 0.48 / 31 Lottia gigantia Noggin B 127771 Clade nodes. Lophotrochozoan sequences underlined in Pomatoceros lamarckii Noggin B red. Sequences used in phylogenetic analysis, along * / 40 0.58 / 17 Schistosoma mansoni Noggin B2 XP002570128.1 1 / 99 Schistosoma mansoni Noggin B1 XP002578816.1 with alignment, can be found in Supplementary File 0.67 / * 1. Scale bars represent substitutions per site at given Capitella teleta Noggin B 207073 distances. 0.2 544 N.J. Kenny et al.

Tolloids A 1 / 99 Homo sapiens Follistatin FST344 NP 037541.1 Tolloids are found extracellularly, and cleave 1 / 10 0 Gallus gallus Follistatin NP 990531.1 regulators of TGF-b signalling when they have Danio rerio Follistatin-A NP 571112.3 0.99 / 8 0 formed complexes with free ligands, releasing the 1 / 71 Danio rerio Follistatin-B NP 001034720.1 ligand they have bound and allowing signalling to 0.98 / 60 Saccoglossus kowalevskii Follistatin-like XP 002736683.1 Drosophila melanogaster Follistatin NP 652376.2 occur. This action is vitally important in regulating 0.9 / 29 a range of developmental processes, including 1 / 9 4 Anopheles gambiae Follistatin XP 316346.4 Patella vulgata Follistatin the establishment of dorsoventral polarity, Fig. 8B 0.69 / 34 1 / 8 9 Lottia gigantea Follistatin 102759 shows the result of phylogenetic analysis of Tolloid 1 / 100 Pomatoceros lamarckii Follistatin sequences from a range of metazoan species, 0.97 / 51 Capitella teleta Follistatin 186626 rooted with the homologue found in N. vectensis. Schmidtea mediterranea Follistatin AFJ24723.1 A single Tolloid homologue was found in both 1 / 84 Schistosoma mansoni Follistatin XP 002571695.1 mollusc species examined, in P. lamarcki, and in Nematostella vectensis Follistatin XP 001624524.1 all schistosome species (S. mansoni shown on 0.2 tree). It appears to have been lost from rotifer genomes, although A. vaga possesses a number B 1 / 84 Drosophila melanogaster Tolkin AAF56328.1 of similar metalloproteinases with high similarity 1 / 49 Drosophila melanogaster Tolloid AAF56329.2 to NAS proteins and teleost hatching 0.96 / 37 Tribolium castaneum Tolloid XP970162.1 Capitella teleta divergent Tolloid 224057 enzymes that may have been acquired by hori- 1 / 79 0.65 / 25 Schistosoma mansoni Tolloid XP002573725.1 zontal gene transfer from those species. C. teleta Pomatoceros lamarckii Tolloid 0.60 / * Capitella teleta Tolloid 221124 possesses two homologues, one with the canonical 0.95 / 52 1 / 100 Patella vulgata Tolloid domain structure (inset, Fig. 8B) and one with a * / 38 ! 1 / 99 Lottia gigantea Tolloid 236092 Stereotypical Tolloid highly divergent structure, although this does not Branchiostoma floridae BMP1 (Tolloid) AAX84844.1 appear to be the result of mis-annotation of the 1 / 98 Mus musculus Tolloid-like protein 2 NP036034.1 C. teleta genome, and no equivalent is found in 0.82 / 52 Homo sapiens Tolloid-like protein 2 NP036597.1 Gallus gallus Tolloid-like 2 XP003641587.1 Capitella teleta divergent Tolloid 224057 1 / 97 H. robusta or P. lamarcki. Two Tolloids (Tolloid 1 / 92 Gallus gallus Tolloid-like protein 1 NP990034.1 = region used for phylogenetic analysis 1 / 98 and Tolkin) are found in D. melanogaster, and 0.66 / 45 Homo sapiens Tolloid-like protein 1 NP036596.3 our analysis corroborates the suggestion of van Danio rerio Tolloid-like protein 1 NP571085.1 1 / 72 Homo sapiens BMP1 NP006120.1 der Zee et al., (2008) that this Tolkin homologue 1 / 82 Danio rerio BMP1a NP001035126.1 is in fact a lineage specific paralogue, specific to 1 / 87 Danio rerio BMP1b NP001034901.1 the Diptera rather than found protostome-wide. Nematostella vectensis Tolloid EDO41783.1

The roles played by Tolloid in lophotrochozoan 0.2 development are yet to be fully explored, but

Herpin et al., 2007 have noted that C. gigas Tol- C 1 / 99 Patella vulgata SMURF loid is capable of ventralising zebrafish embryos. 0.99 / 83 Lottia gigantea SMURF 81273 0.68 / 39 Crepidula fornicata SMURF ADI48176.1 Furthermore, C. gigas Tolloid is maternally depos- Capitella telata SMURF 153266 1 / 68 ited into the oocyte, and may play an early role in .97/7 0 Pomatoceros lamarckii SMURF patterning body axes in this species. A conserved Apis mellifera SMURF XP396318.4 Tribolium castaneum lethal with a checkpoint kinase SMURF XP966429.2 role for Tolloid in dorsoventral polarity establish- 0.99 / 85 1 / 100 0.99 / 77 Drosophila melanogaster lethal with a checkpoint kinase SMURF NP523779.1 ment in the Lophotrochozoa is therefore possible, 1 / 94 Danio rerio SMURF2 NP001107898.1 although more investigation is required to confirm 1 /100 Homo sapiens SMURF2 NP073576.1 this, especially outside the Mollusca. 0.76 / 64 Danio rerio SMURF1 NP001001943.1 1 / 99 Homo sapiens SMURF1 AAF08298.2 Mnemiopsis leidyi SMURF AEP16400.1 SMURFs Homo sapiens Wwp2 AAC51325.1 SMURF (SMAD specific E3 ubiquitin protein ligase) proteins regulate TGF-b signalling by a 0.1 number of mechanisms, generally involving the Fig. 8. Follistatin (A), Tolloid (B) and Smurf (C) interrelationships across the Metazoa, targeting of R-Smads for degradation, but other as determined by maximum likelihood (Tamura et al., 2011) and Bayesian (Huelsenbeck roles have also been posited for these proteins. and Ronquist 2001) methods. Phylogeny shown is the result of ML analysis, with differ- Two SMURF genes are characterised in verte- ences in topology using Bayesian methods indicated with a dotted line. Follistatin phylogenies brates, but to date only single orthologues have inferred on the basis of a MAFFT alignment (Katoh and Standley 2013, E-INS-i strategy) with 227 informative sites, analysed under the WAG model (Whelan and Goldman 2001), rooted with N. vectensis Follistatin (XP 001624524.1). Tolloid phylogeny shown generated according to the JTT model (Jones et al., 1992, ML)/Blosum model (Henikoff and Henikoff 1992, Bayesian) from a 150 informative amino acid alignment spanning the calcium-binding EGF domain and immediately proceeding the Cub domain as shown on inset, generated using MAFFT under the G-INS-i strategy and rooted with N. vectensis Tolloid (EDO41783.1). Inset shows the domain structure of canonical Tolloid proteins, along with that of the C. teleta divergent paralogue, presented from the Pfam database. The Smurf phylogeny was calculated under the JTT model, based on a 233 informative amino acid alignment generated using MAFFT using the G-INS-i strategy with outgroup specified as H. sapiens Wwp2 (AAC51325.1). Posterior probabilities/ bootstrap percentage (of 1,000 replicates) and can be seen at the base of nodes. Sequences used in phylogenetic analysis, along with alignment, can be found in Supplementary File 1. Lophotrochozoan sequences underlined in red. Scale bars represent substitutions per site at given distances. The Lophotrochozoan TGF-b signalling cassette 545 been found in sequenced invertebrate species. Our phylogenetic deuterostomes. analysis (Fig. 8C) of these genes from species across the Metazoa Fig. 9A shows the results of phylogenetic inference into the suggests a paralogous relationship between the two homologues inter-relationships of BAMBI homologues from a variety of spe- found in vertebrate model species. cies in the Protostomia and Deuterostomia. No BAMBI orthologue SMURF proteins appear to have originated within the early meta- could be identified in C. elegans, or in the Schistosome species zoan lineage, and clear homologues can be identified for these in or Rotifera sampled. Hydra magnipapillata, M. leidyi and T. adhaerens, although not to We note that a putative BAMBI sequence has been described date in N. vectensis. While single canonical SMURF orthologues are in a previous publication in S. mediterranea (Gavino and Red- readily identifiable in a variety of protostome species, no SMURF dien 2011), but in our investigations this sequence (ADX42731.1) proteins can be found in the genomes of C. elegans, A. vaga, B. appears to more readily resemble canonical serine/threonine plicatilis or the schistosome species. These species do, however, kinase receptors, clustering with these sequences in the course possess other E3 ubiquitin-protein ligases that readily cluster with of phylogenetic analysis, although the partial sequence provided nedd-4-like E3 ubiquitin-protein ligases, and are probably homo- does not include the intracellular serine/threonine kinase domain logues of this class of protein. We note that two Crepidula fornicata required for signalling. We are unable to test whether this domain (Mollusca) SMURF homologues are present in the NCBI nr dataset, absence is the result of fragmentary sequence or is truly present although one (ADI48175.1) is only a fragmentary sequence with in the complete protein, but if it is the latter this may represent the 100% similarity to its homologue at the amino acid level, and as re-evolution of a trait present more generally across the Metazoa. such was not used in our phylogenetic analysis. NOMO BMP and activin membrane-bound inhibitor (BAMBI) NOMO (Nodal Modulator, previously known as pM5) is known BMP and activin membrane-bound inhibitor is a pseudoreceptor for its role in directly antagonising Activin/Nodal signalling, in con- that competes with true Type II receptors for ligand binding (Onich- cert with Nicalin, with which it forms a transmembrane complex tchouk et al., 1999). The presence of this gene has been noted in at the endoplasmic reticulum (Haffner 2004). These complexes protostomes previously, particularly by van der Zee et al., (2008) are similar to complexes that regulate g-secretase activity, but and Huminiecki et al., (2009). Searches through the genomes of a do not perform the same roles, as shown in Zebrafish rescue as- variety of non-bilaterian metazoans revealed no trace of a BAMBI says. Instead they have been shown to regulate the formation of homologue, and it thus seems likely that BAMBI emerged on the mesendoderm by attenuating Nodal signalling. NOMO is, however, lineage leading to the last common ancestor of protostomes and still under-researched, with little known of its molecular mode of

0.61 / 59 Gallus gallus BAMBI NP001182330.1 A 1 / 97 Homo sapiens BAMBI NP036474.1 0.77 / 50 Danio rerio BAMBI NP571859.1 Branchiostoma floridae BAMBI XP002589511.1 0.99 / 62 0.96 / 56 Saccoglossus kowalevskii BAMBI NP001158365.1 Fig. 9. Bambi (A) and NOMO (B) interrelationships Pomatoceros lamarckii BAMBI across the Metazoa, as determined by maximum Capitella teleta BAMBI 220461 1 / 71 likelihood (Tamura et al., 2011) and Bayesian 1 / 100 0.96 / 65 Lottia gigantea BAMBI 151562 (Huelsenbeck and Ronquist 2001) methods. 0.99 / 77 Patella vulgata BAMBI Phylogeny shown is the result of ML analysis, with

1 / * Tribolium castaneum BAMBI EFA10728.1 differences in topology using Bayesian methods Nasonia vitripennis BAMBI XP003424368.1 * / 3 5 indicated with dotted lines. Bambi phylogenies 1 / 92 Bombus terrestris BAMBI XP003400296.1 determined using the JTT model (Jones et al., 0.99 / 92 Apis mellifera BAMBI XP001120213.2 1992), NOMO phylogenies under the WAG model Homo sapiens TGFbeta receptor 1 NP004603.1 (Whelan and Goldman 2001). Bambi phylogeny based on 73 informative amino acid alignment cre- 0.2 ated using MAFFT (Katoh and Standley 2013) under Homo sapiens NOMO3 NP001004067.1 the L-INS-i model, spanning the transmembrane 1 / 100 B Homo sapiens NOMO2 NP001004060.1 1 / 98 domain, incorporating some conserved regions both Homo sapiens NOMO1 AAH65535.1 1 / 100 extra- and intracellularly. NOMO phylogeny based on Gallus gallus NOMO/Pm5 XP414903.2 0.75 / 67 G-INS-i alignment by MAFFT, with 439 informative Danio rerio NOMO XP002662361.1 1 / 69 positions. Bambi phylogeny rooted with H. sapiens Saccoglossus kowalevskii NOMO ADB22613.1 TGF-b receptor 1 NP004603.1, NOMO tree with an Branchiostoma floridae NOMO XP002587131.1 1 / 82 apparent NOMO sequence found in Arabidopsis Patella vulgata NOMO 1 / 99 thaliana (NP_191795.1). Numbers at node reflect Lottia gigantea NOMO 104485 Bayesian posterior probabilities/bootstrap support 1 / 74 0.77 / 51 Capitella teleta NOMO 179513 (1,000 replicates, JTT model/WAG model, all default 1 / 77 Pomatoceros lamarckii NOMO 0.64 / 45 Drosophila melanogaster CG1371 NP610551.1 priors) respectively. Lophotrochozoan sequences underlined in red. Posterior probabilities/bootstrap 1 / 100 1 / 100 Anopheles gambiae NOMO XP 315882.4 Brachionus plicatilis NOMO percentage (of 1,000 replicates) and can be seen Schistosoma mansoni NOMO XP002575324.1 at the base of nodes. Sequences used in phyloge- Arabidopsis thaliana Carbohydrate-binding-like fold-containing NP 191795.1 netic analysis, along with alignment, can be found in Supplementary File 1. Scale bars represent 0.2 substitutions per site at given distances. 546 N.J. Kenny et al. action (Dettmer 2010). B. plicatilis and the planarian S. mansoni. Extensive loss seems to Phylogenetic analysis of NOMO sequence from a range of have occurred at the ligand and regulatory levels in these Phyla, metazoans can be seen in Fig. 9B, rooted with an apparent NOMO with Smad and receptor diversity relatively unchanged, particularly sequence found in Arabidopsis thaliana. NOMO is found in every in S. mansoni. genome examined in this work, implying a crucial role in cell sig- These apparent lost genes could be missing from our datasets nalling, even in species where Nodal is not found (for example, due to inadequate assembly of these genomes. This is more likely in the Ecdysozoa). for B. plicatilis than for S. mansoni, as this genome results from NOMO-like sequences are well described outside the Metazoa a much deeper sequencing effort. Other planarian genomes also and are suggested to be found throughout the Eukaryota (Homolo- exist, allowing assessment by comparison to outgroups in cases of Gene:13810). It seems that NOMO is particularly poorly named loss. Analyses of the B. plicatilis dataset, however, implies that the given its conservation outside of clades where Nodal exists, and substantial majority of genes present in that species are recovered in many cases where TGF-b signalling is entirely absent. Some by the genome dataset used here, and while some missing genes recent analyses have suggested that NOMO is involved in the are perhaps to be expected from our analysis, we would not expect regulation of nicotinic acetylcholine receptor functionality (Almedom these to be numerous. et al., 2009, for example), and this, along with the widespread con- It should be noted that the TGF-b componentry of S. mansoni servation of NOMO throughout the Eukaryota, suggests that this is very similar to that of S. japonicum and in most cases to that complex has been recruited by metazoans for further regulation of S. mediterranea, and it has been chosen as a representative of TGF-b signalling – although much mechanistic work is required and very well described member of its Phylum for the purposes of to untangle how this regulation is performed. comparison. While some differences exist – S. mediterranea, for example, has eight Noggin sequences listed on Genbank – these Discussion appear to be the result of lineage specific loss or gain, and theS. mediterranea genome is still generally cited as a draft resource. General Lophotrochozoan TGF-b componentry and the evolu- It is possible that more basal planarian species will exhibit a more tion of TGF-b signalling complete TGF-b complement than the species currently sampled. The results of our analysis of lophotrochozoan TGF-b cas- Any loss of regulatory elements is of interest, as the fine-scale settes can be seen in Table 1 and for TGF-b ligands in particular regulation of TGF-b ligands is known to play a variety of key roles in Fig. 10. In general, the complements of the annelid and mollusc in the establishment of bodily axes in other Phyla, for example, species considered in our analysis resemble that of invertebrate in the establishment of dorsoventral polarity and the germ layers deuterostomes more than they resemble those of ecdysozoan in Xenopus laevis (Hill 2001). How lophotrochozoans accomplish models. Ligand diversity seems more pronounced than that seen these tasks, perhaps without the aid of these modulators of signal- in the Ecdysozoa, and many regulatory components, such as the ling, remains to be established. noggin class, seem to be the result of Ecdysozoa-specific, rather While the internal phylogeny of the lophotrochozoan clade is than protostome-wide, loss. yet to be fully resolved, most studies place molluscs and annelids The most striking differences between the cassettes of other as sister groups (in the Trochozoan clade, along with brachiopods, metazoans and our sampled datasets are in the case of the rotifer phoronids and nemerteans), with planarian and rotifer data implying that these Phyla are only distantly related to

BMP-Like TGF like the Trochozoa sensu stricto. While our sam-

BMP pling may not allow us to trace the evolution Myostatin/Myoglianin BMP of the lophotrochozoan TGF-b complement 5/6/7/8 (Gbb/Scw) GDF1/3 Univin Vg across the entirety of its constituent Phyla, Maverick/GDF2 9/10 GDF 5/6/7 Activin/Inhibin BMP2/4BM (Dpp)P GDF9/BMP Lefty/Antivin Types of Ligands: we can be confident that the last common 3/GDF 10

Others ancestor of the Trochozoa, Rotifer and ADMP Nodal TGF ALP 15 Platyhelminthes will be relatively closely related in molecular complement to the Homo sapiens Urlophotrochozoan (the common ancestor Strongylocentrotus purpuratus Drosophila melanogaster of all lophotrochozoans), and our sampling Apis mellifera thus allows us to draw inference as to the Capitella teleta cassette of that hypothetical organism, by Pomatoceros lamarckii Lottia gigantea mapping gain and loss onto a schematic Patella vulgata phylogeny (Fig. 11). Adineta vaga Our work therefore suggests that we Brachionus plicatilis might expect the TGF-b signalling comple- Schistosoma mansoni Nematostella vectensis ment of the Urlophotrochozoan to be com- Mnemiopsis leidyi plete, with only the Dan clade completely missing from the ancestrally shared regula- Fig. 10. TGF-b ligand family presence and absence in a range of metazoan species, as deter- tory cassette across all lophotrochozoan mined in the present manuscript or inferred from previously published work. Only species with Phyla, and no lophotrochozoan-wide loss clearly informative TGF-b ligand identity known are listed above. Confirmed identity is shown in green (no lines), tentative identity in orange (single diagonal line) and absence in red (two diagonal lines, seen in the ligand complement. Loss is, forming a cross). however, prevalent in gene families in the The Lophotrochozoan TGF-b signalling cassette 547 planarians and in the rotifer species sampled, implying that a di- ter Superphyla were described, as often one or other lacked a verse TGF-b complement may not be necessary for these clades, gene completely. This has and will continue to allow us to infer although the reasons why these genes have been lost while being ancestral roles for a variety of genes, a vital step in understanding so well conserved in other lineages remains unknown. how animal life in general, and the TGF-b signalling cascade in It should be noted that our findings suggest that the use of the particular, has evolved. phrase “TGF-b signalling” represents a historical accident, rather than a unified nomenclature reflecting a diversity of TGF-b ligands. Conclusions The considerable momentum of the published literature means that this classification scheme is unlikely to be challenged, but we would Here we have presented the first systematic treatment of the suggest that “BMP signalling” is a more representative term for the TGF-b signalling cascade in the Lophotrochozoa. This work will ligand superclass and cascade as a whole, with TGF-b ligands be- provide the building blocks to allow us to understand a variety of ing a probable deuterostome novelty. The TGF-b-like class would developmental and homeostasis-related signalling cascades in then perhaps be more properly termed the Activin/Myostatin-like these species, and also provide a valuable resource for tracing class. However, we recognise the historical contingencies inher- the ancestral functionality of the TGF-b pathway. ent in the current naming scheme, and have named our ligands We have shown that while some aspects of the TGF-b signalling accordingly in this manuscript. pathway are very highly conserved in the Lophotrochozoa when In general, however, the TGF-b ligand signalling cassette is well compared to other Superphyla and basal clades, some aspects, conserved in the Lophotrochozoa, which will allow us to compare notably the TGF-b-like ligand complement, are highly derived, and functional roles and expression of these elements between this differ greatly, even when compared between closely related species. Superphylum and the Deuterostomia and the Ecdysozoa – a The loss of regulatory elements of the TGF-b signalling cascade practice that has not always been possible when only these lat- is also particularly intriguing, as without the modulation of signalling provided by these molecules lophotrochozoans may have Non-Bilatarian Non-Vertebrate evolved alternative means of interpreting the levels of signal Ecdysozoa Lophotrochozoa Vertebrata Metazoa Deuterostomia provided by TGF-b ligands, without relying on gradient control. Further research will be required to discern how this occurs. BMP 3 Activin/ This data will act as the starting point for a number of ALP Nodal GDF 9 Maverick Myostatin-likes GDF1/3 investigations into the function, evolution and diversification of these molecules in this under-represented Superphylum,

Lefty and fills in the last major gap remaining in our understanding of the diversity of this cascade across metazoan life.

Nodal, Maverick Materials and Methods ADMP, BMP2/4, BMP3, BMP 5-8, BMP 9/10, Activin, Myostatin Gains in Green at Left Gene identification Losses in Red at Right Gene sequences were derived from P. lamarcki, P. vulgata, B. Metazoan Common Novelties found only in one Ancestor species not mapped. plicatilis (Transcriptome sequences as described in Werner et al., A 2012, Kenny and Shimeld 2012, genomic sequence manuscripts in prep), S. mansoni (Berriman et al., 2009), L. gigantea and C. teleta (Simakov et al., 2013) genomic sequences, downloaded to a local server. A. vaga sequences were taken from http://www.genoscope. Non-Bilatarian Non-Vertebrate Ecdysozoa Lophotrochozoa Vertebrata cns.fr. These were identified using tBlastn (Altschul et al., 1990) of Metazoa Deuterostomia conserved regions of known gene sequence against each dataset. Genes thus putatively identified were then reciprocally blasted

Noggin-like against the NCBI nr database using Blastx to further confirm their Tolkin Dan Chordin-like identity. Conversion into protein sequence was carried out using the EMBOSS Transeq (http://www.ebi.ac.uk/Tools/st/emboss_transeq/) tool, assuming standard codon usage. Dante Phylogenetic analysis Bambi, Tsg Gene sequences were aligned with those of known identity downloaded from the NCBI nr database, using MAFFT version 7 Chordin/Chordin-like, Noggin/ Noggin-like, Follistatin, Gremlin, (Katoh and Standley 2013) under the G-INS-i strategy unless oth- Dan, Tolloid, NOMO, SMURF Gains in Green at Left Losses in Red at Right erwise stated, and alignments saved in fasta format and imported Metazoan Common Clade-wide Loss Shown into MEGA 5 (Tamura et al., 2011) for alignment curation. Conserved B Ancestor domains were identified and alignments trimmed to these areas for further analysis, with sections of alignment containing gaps excluded Fig. 11. Gain and loss of TGF-b ligands (A) and modulators of TGF-b signalling in all cases for the purpose of phylogenetic inference. (B) across the Metazoa, mapped onto a schematic cladogram of the inter- Maximum likelihood analysis was performed using MEGA5 relationships of these Superphyla. Position of TGF-b itself is contingent on the (Tamura et al., 2011) using the WAG model (Whelan and Goldman true assignation of TGF-b status to M. leidyi protein sequence by Pang et al., 2011, 2001) unless otherwise stated, 1,000 bootstrap replicates and all as this hypothesis is only weakly supported by our analysis. other default prior settings. Bayesian phylogenetic analysis was 548 N.J. Kenny et al.

performed with MrBayes v3.2.1-x64 software (Huelsenbeck and Ronquist is an evolutionary novelty. Dev Biol 377: 245-261. 2001) using the WAG model (Whelan and Goldman 2001) of amino acid DUNN C W, HEJNOL A, MATUS D Q, PANG K, BROWNE W E, SMITH S A, et al., substitution unless otherwise stated, after initial identification using mixed (2008) Broad phylogenomic sampling improves resolution of the animal tree of models. The Monte Carlo Markov Chain search was run over 1,000,000 life. Nature 452: 745-749 generations, unless otherwise stated in figure legends, and trees were FLEURY E, FABIOUX C, LELONG C, FAVREL P and HUVET A (2008). Character- sampled every 1,000 generations, with the first 25 % of trees thus gath- ization of a gonad-specific transforming growth factor-beta superfamily member ered discarded as ‘burn-in’. For the purposes of tree display, Contype was differentially expressed during the reproductive cycle of the oyster Crassostrea configured to Allcompat, but all other priors remained at default. gigas. Gene 410: 187-196. Gene sequences are firmly assigned homology to known genes where FLOT J-F, HESPEELS B, LI X et al., Genomic evidence for ameiotic evolution in the both maximum likelihood and Bayesian trees agree on the placement of bdelloid rotifer Adineta vaga. Nature 500: 453-457. these sequences within a clade of genes of known identity, and at least FORRESTER S G, WARFEL P W and PEARCE E J (2004). Tegumental expression one phylogenetic reconstruction method shows node support with these of a novel type II receptor serine/threonine kinase (SmRK2) in Schistosoma genes of known homology with a bootstrap value greater than or equal to mansoni. Mol Biochem Parasitol 136: 149-156. 80, or posterior probability support greater than or equal to 0.9. The suffix FREITAS T C, JUNG E and PEARCE E J (2007). TGF-b signaling controls embryo ‘-like’ is affixed to gene names where these criteria are not met. development in the parasitic flatwormSchistosoma mansoni. PLoS Path 3: e52. GAVINO M A and REDDIEN P W (2011). A Bmp/Admp Regulatory Circuit Controls Acknowledgements Maintenance and Regeneration of Dorsal-Ventral Polarity in Planarians. Curr We give our thanks to all the members of our lab groups, whose time and Biol 21: 294-299. discussions contributed much to this manuscript. Funds for sequencing P. GRANDE C and PATEL N H (2009). Nodal signalling is involved in left-right asym- lamarcki and P. vulgata were provided by the Elizabeth Hannah Jenkinson metry in snails. Nature 457: 1007-1011. Fund. Funds for sequencing B. plicatilis were provided by a Royal Society GRANDE C, MARTIN-DURAN J M, KENNY N J, TRUCHADO-GARCIA M and HEJNOL of New Zealand Marsden Grant (UOO0602) to PKD. NJK was supported A (2014). Evolution, divergence and loss of the Nodal signalling pathway: new by a Clarendon Fund studentship and much of this work was completed data and a synthesis across the Bilateria. Int J Dev Biol 58: 521-532. with the aid of a Santander Academic Travel Award in the lab of CG. CG is HAFFNER C, FRAULI M, TOPP S, IRMLER M, HOFMANN K, REGULA JT, BALLY- currently a ‘‘Ramon y Cajal’’ postdoctoral fellow supported by the Spanish CUIF L, HAASS C (2004). Nicalin and its binding partner Nomo are novel Nodal MICINN and the UAM and funded by project CGL2011-29916 (MICINN). signaling antagonists. EMBO J 23: 3041-3050. Our thanks also to Professor Henry, the editor of this special issue, and to HELDIN C H and MOUSTAKAS A (2012). Role of Smads in TGF-b signaling. Cell two anonymous reviewers for their helpful comments on this work. Tis Res 347: 21-36. HENIKOFF S and HENIKOFF J G (1992) Amino acid substitution matrices from References protein blocks. Proc. Natl. Acad. Sci. USA 89: 10915-10919. HERPIN A, LELONG C and FAVREL P (2004). Transforming growth factor-b-related ADAMSKA M, DEGNAN S M, GREEN K M, ADAMSKI M, CRAIGIE A, et al., (2007). proteins: an ancestral and widespread superfamily of cytokines in metazoans. Wnt and TGF-b Expression in the Sponge Amphimedon queenslandica and the Dev Compar Immunol 28: 461-485. Origin of Metazoan Embryonic Patterning. PLoS ONE 2: e1031. HERPIN A, LELONG C, BECKER T, ROSA F, FAVREL P and CUNNINGHAM C (2005). AL-SALIHI M A, HERHAUS L, SAPKOTA G P (2012). Regulation of the transforming Structural and functional evidence for a singular repertoire of BMP receptor signal growth factor b pathway by reversible ubiquitylation. Open Biol 2:120082. dx.doi. transducing proteins in the lophotrochozoan Crassostrea gigas suggests a shared org/10.1098/rsob.120082. ancestral BMP/activin pathway. FEBS J 272: 3424-3440. ALMEDOM R B, LIEWALD J F, HERNANDO G, SCHULTHEIS C, RAYES D, PAN HERPIN A, LELONG C, BECKER T, FAVREL P, and CUNNINGHAM C (2007). A J, SCHEDLETZKY T, HUTTER H, BOUZAT C, GOTTSCHALK A (2009). An ER- tolloid homologue from the Pacific oyster Crassostrea gigas. Gene Expr Pat- resident membrane protein complex regulates nicotinic acetylcholine receptor terns 7 (6): 700-708. subunit composition at the synapse. EMBO J 28(17): 2636-2649. HILL C S (2001). TGF-b signalling pathways in early Xenopus development. Curr ALTSCHUL S, GISH W, MILLER W, MYERS E and LIPMAN D (1990). Basic local Opin Genet Dev 11: 533-540. alignment search tool. J Mol Biol 215: 403-410. HOLLEY S A, JACKSON P D, SASAI Y, LU B, DE ROBERTIS E M, HOFFMANN F M, BALEMANS W and VAN HUL W (2002). Extracellular regulation of BMP signaling in and FERGUSON E L (1995). A conserved system for dorsal-ventral patterning in vertebrates: a cocktail of modulators. Dev Biol 250: 231-250. insects and vertebrates involving sog and chordin. Nature 376 (6537): 249-253. BERRIMAN M, HAAS B J, LOVERDE P T, WILSON R A, DILLON G P, CERQUEIRA G HUELSENBECK J P and RONQUIST F (2001). MRBAYES: Bayesian inference of C, MASHIYAMA S T, AL-LAZIKANI B, ANDRADE L F, ASHTON P D et al., (2009). phylogenetic trees. Bioinformatics 17: 754-755. The genome of the blood fluke Schistosoma mansoni. Nature 460: 352-358. HUMINIECKI L, GOLDOVSKY L, FREILICH S, MOUSTAKAS A, OUZOUNIS C and b CHENG S K, OLALE F, BENNETT J T, BRIVANLOU A H and SCHIER A F (2003). HELDIN C H (2009). Emergence, development and diversification of the TGF- EGF-CFC proteins are essential coreceptors for the TGF-b signals Vg1 and signalling pathway within the animal . BMC Evol Biol 9: 28. GDF1. Genes Dev 17: 31-36. ITOH S and TEN DIJKE P (2007). Negative regulation of TGF-b receptor/Smad signal DAVIES S J, SHOEMAKER C B and PEARCE E J (1998). A Divergent Member of transduction. Curr Opin Cell Biol 19: 176-184. the Transforming Growth Factor b Receptor Family from Schistosoma mansoni Is JONES D T, TAYLOR W R, THORNTON J M (1992). The rapid generation of mutation Expressed on the Parasite Surface Membrane. J. Biol Chem 273: 11234-11240. data matrices from protein sequences. Computer applications in the biosciences: DAYHOFF M O, SCHWARTZ R M, ORCUTT B C (1978). A model of evolutionary CABIOS 8 (3): 275-282. change in proteins. In Atlas of protein sequence and structure Volume 5 (Ed M O KATOH K, STANDLEY D M (2013). MAFFT Multiple Sequence Alignment Software Dayhoff) National Biomedical Research Foundation, Washington DC, pp 345-351. Version 7: Improvements in Performance and Usability. Mol Biol Evol 30: 772-780. DERYNCK R and MIYAZONO K (2008). TGF-b and the TGF-b family. In The TGF-b KENNY N and SHIMELD S (2012). Additive multiple k-mer transcriptome of the keel- Family (Eds. R DERYNCK and K MIYAZONO). Cold Spring Harbor Laboratory worm Pomatoceros lamarcki; (Annelida; Serpulidae) reveals annelid Press, New York, pp. 29-43. transcription factor cassette. Dev Genes Evol 222: 325-339. DETTMER U, KUHN P-H, ABOU-AJRAM C, LICHTENTHALER S F, KRUGER M, KUO D-H and WEISBLAT D A (2011). A New Molecular Logic for BMP-Mediated KREMMER E, HAASS C, HAFFNER C (2010). Transmembrane Protein 147 Dorsoventral Patterning in the Leech Helobdella. Curr Biol 21: 1282-1288. (TMEM147) Is a Novel Component of the Nicalin-NOMO Protein Complex. J Biol LAPRAZ F O, ROTTINGER E, DUBOC V R, RANGE R, DULOQUIN L, WALTON Chem 285(34): 26174-26181. K, WU S-Y, BRADHAM C, LOZA M A, HIBINO T et al., (2006). RTK and TGF-b DUNCAN E J, BENTON M A, DEARDEN P K (2013). Canonical terminal patterning signaling pathways genes in the sea urchin genome. Dev Biol 300: 132-152. The Lophotrochozoan TGF-b signalling cassette 549

MASSAGUÉ J (1998). TGF-b Signal Transduction. Ann Rev Bioc 67: 753-791. SCOTT I C, BLITZ I L, PAPPANO W N, IMAMURA Y, CLARK T G, STEIGLITZ B M, MASSAGUÉ J (2012). TGF-b signaling in context. Nat Rev Mol Cell Bio 13: 616-630. THOMAS C L, MAAS S A, TAKAHARA K, CHO K W Y et al., (1999). Mammalian BMP-1/Tolloid-Related Metalloproteinases, Including Novel Family Member Mam- MASSAGUÉ J, BLAIN S W and LO R S (2000). TGF-b signaling in growth control, malian Tolloid-Like 2, Have Differential Enzymatic Activities and Distributions of cancer, and heritable disorders. Cell 103: 295-309. Expression Relevant to Patterning and Skeletogenesis. Dev Biol 213: 283-300. MASSAGUÉ J, SEOANE J and WOTTON D (2005). Smad transcription factors. SHEN M M and SCHIER A F (2000). The EGF-CFC gene family in vertebrate devel- Genes Dev 19: 2783-2810. opment. Trend Gene 16: 303-309. MATUS D Q, PANG K, MARLOW H, DUNN C W, THOMSEN G H and MARTINDALE SHI Y and MASSAGUÉ J (2003). Mechanisms of TGF-b signaling from cell membrane M Q (2006). Molecular evidence for deep evolutionary roots of bilaterality in animal to the nucleus. Cell 113: 685-700. development. Proc. Natl. Acad. Sci. USA 103: 11195-11200. SIMAKOV O, MARLETAZ F, CHO S-J, EDSINGER-GONZALES E, HAVLAK P, MOLINA M D, NETO A, MAESO I, GOMEZ-SKARMETA J L, SALO E and CEBRIA F HELLSTEN U, KUO D-H, et al., (2013). Insights into bilaterian evolution from (2011). Noggin and Noggin-Like Genes Control Dorsoventral Axis Regeneration three spiralian genomes. Nature 493: 526-531. in Planarians. Curr Biol 21: 300-305. SRIVASTAVA M, SIMAKOV O, CHAPMAN J, FAHEY B, GAUTHIER M E A, MITROS MOUSTAKAS A and HELDIN C H (2009). The regulation of TGF-b signal transduc- T, RICHARDS G S, CONACO C, DACRE M, HELLSTEN U et al., (2010). The tion. Development 136: 3699-3714. Amphimedon queenslandica genome and the evolution of animal complexity. NEDERBRAGT A J, VAN LOON A E and DICTUS W J (2002). Expression of Patella Nature 466: 720-726. vulgata orthologs of engrailed and dpp-BMP2/4 in adjacent domains during SUGA H, ONO K and MIYATA T (1999). Multiple TGF-b receptor related genes in molluscan shell development suggests a conserved compartment boundary sponge and ancient gene duplications before the parazoan/eumetazoan split. mechanism. Dev Biol 246: 341-355. FEBS Lett 453: 346-350. OELGESCHLAGER M, LARRAIN J, GEISSERT D, & DE ROBERTIS E M (2000). TAMURA K, PETERSON D, PETERSON N, STECHER G, NEI M and KUMAR The evolutionarily conserved BMP-binding protein Twisted gastrulation promotes BMP signalling. Nature, 405, 757-763. S (2011). MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol ONICHTCHOUK D, CHEN Y G, DOSCH R, GAWANTKA V, DELIUS H, MASSAGUÉ Evol. 28: 2731-2739. J and NIEHRS C (1999). Silencing of TGF-b signalling by the pseudoreceptor b BAMBI. Nature 401: 480-485. TEN DIJKE P and ARTHUR HM (2007). Extracellular control of TGF- signalling in vascular development and disease. Nat Rev Mol Cell Biol 8: 857-869. OSMAN A, NILES E G, VERJOVSKI-ALMEIDA S, LOVERDE P T. Schistosoma b mansoni TGF-b Receptor II: Role in host ligand-induced regulation of a schisto- VAN DER ZEE M, DA FONSECA R N and ROTH S (2008). TGF- signaling in some target gene. PLoS Pathog. 2006. e54 Tribolium: vertebrate-like components in a beetle. Dev Gene Evol 218: 203-213. PANG K, RYAN J F, BAXEVANIS A D and MARTINDALE M Q (2011). Evolution of VILMOS P, GAUDENZ K, HEGEDUS Z, and MARSH J L (2001). The Twisted gas- the TGF-b signaling pathway and its potential role in the ctenophore, Mnemiopsis trulation family of proteins, together with the IGFBP and CCN families, comprise leidyi. PLoS ONE 6: e24152. the TIC superfamily of cysteine rich secreted factors. Mol Pathol 54: 317-323. RENTZSCH F, ANTON R, SAINA M, HAMMERSCHMIDT M, HOLSTEIN T W and WAGNER G P (1996). Homologues, Natural Kinds and the Evolution of Modularity. TECHNAU U (2006). Asymmetric expression of the BMP antagonists chordin and Am Zool, 36: 36-43. gremlin in the sea anemone Nematostella vectensis: Implications for the evolution WERNER G D, GEMMELL P, GROSSER S, HAMER R, SHIMELD S M (2012). Analy- of axial patterning. Dev Biol 296: 375-387. sis of a deep transcriptome from the mantle tissue of Patella vulgata Linnaeus RICHARDS G S and DEGNAN B M (2009). The dawn of developmental signaling (Mollusca: : Patellidae) reveals candidate biomineralising genes. Mar in the Metazoa. Cold Spring Harb Sym 74, 81-90. doi: 10.1101/sqb.2009.74.028 Biotechnol 15: 230-243. ROSS S and HILL C S (2008). How the Smads regulate transcription. Int J Biochem WHELAN S, GOLDMAN N (2001). A General Empirical Model of Protein Evolution Cell Biol 40: 383-408. Derived from Multiple Protein Families Using a Maximum-Likelihood Approach. Mol Biol Evol 18: 691-699. SAKUTA H, SUZUKI R, TAKAHASHI H, KATO A, SHINTANI T, IEMURA S-I, YAMA- MOTO T S, UENO N, NODA M (2001). Ventroptin: A BMP-4 Antagonist Expressed YAMAMOTO Y and OELGESCHLÄGER M (2004). Regulation of bone morphogenetic in a Double-Gradient Pattern in the Retina. Science 293: 111-115. proteins in early embryonic development. Naturwissenschaften 91: 519-534. Further Related Reading, published previously in the Int. J. Dev. Biol.

Evolution, divergence and loss of the Nodal signalling pathway: new data and a synthesis across the Bilateria. C Grande, J M Martin-Duran, N J Kenny, M Truchado-Garcia and A Hejnol. Int. J. Dev. Biol. (2014) doi: 10.1387/ijdb.140133cg

On the origin of pattern and form in early Metazoans Frederick W. Cummings Int. J. Dev. Biol. (2006) 50: 193-208 http://dx.doi.org/10.1387/ijdb.052058fc

The first bilaterian organisms: simple or complex? New molecular evidence J Baguna, I Ruiz-Trillo, J Paps, M Loukota, C Ribera, U Jondelius, M Riutort Int. J. Dev. Biol. (2001) 45: S133-S134 http://dx.doi.org/10.1387/ijdb.01450133

Nodal signaling and the zebrafish organizer A F Schier & W S Talbot Int. J. Dev. Biol. (2001) 45: 289-298 http://www.ijdb.ehu.es/web/paper/11291859/

Functional analysis of the TGFbeta receptor/Smad pathway through gene ablation in mice M J Goumans & C Mummery Int. J. Dev. Biol. (2000) 44: 253-266. http://www.ijdb.ehu.es/web/paper/10853822 5 yr ISI Impact Factor (2011) = 2.959