Calcisponges Have a Parahox Gene and Dynamic Expression of 2 Dispersed NK Homeobox Genes 3

1 Calcisponges have a ParaHox gene and dynamic expression of 2 dispersed NK homeobox genes 3 4 Sofia A.V. Fortunato1, 2, Marcin Adamski1, Olivia Mendivil Ramos3, †, Sven Leininger1, , 5 Jing Liu1, David E.K. Ferrier3 and Maja Adamska1§ 6 1Sars International Centre for Marine Molecular Biology, University of Bergen, 7 Thormøhlensgate 55, 5008 Bergen, Norway 8 2Department of Biology, University of Bergen, Thormøhlensgate 55, 5008 Bergen, 9 Norway 10 3 The Scottish Oceans Institute, Gatty Marine Laboratory, School of Biology, University of 11 St Andrews, East Sands, St Andrews, Fife KY16 8LB, UK. 12 † Current address: Stanley Institute for Cognitive Genomics, Cold Spring Harbor 13 Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11724, USA. 14 Current address: Institute of Marine Research, Nordnesgaten 50, 5005 Bergen, Norway 15 §Corresponding author 16 Email address: [email protected] 17 18 Summary 19 Sponges are simple animals with few cell types, but their genomes paradoxically 20 contain a wide variety of developmental transcription factors1‐4, including 21 homeobox genes belonging to the Antennapedia (ANTP)class5,6, which in 22 bilaterians encompass Hox, ParaHox and NK genes. In the genome of the 23 demosponge Amphimedon queenslandica, no Hox or ParaHox genes are present, 24 but NK genes are linked in a tight cluster similar to the NK clusters of bilaterians5. 25 It has been proposed that Hox and ParaHox genes originated from NK cluster 26 genes after divergence of sponges from the lineage leading to cnidarians and 27 bilaterians5,7. On the other hand, synteny analysis gives support to the notion that 28 absence of Hox and ParaHox genes in Amphimedon is a result of secondary loss 29 (the ghost locus hypothesis)8. In this study, we analyzed complete suites of Antp 30 class homeoboxes in two calcareous sponges, Sycon ciliatum and Leucosolenia 31 complicata. Our phylogenetic analyses demonstrate that these calcisponges 32 possess orthologues of bilaterian NK genes (Hex, Hmx and Msx), a varying number 33 of additional NK genes and one ParaHox gene, Cdx. Despite generation of scaffolds 34 spanning multiple genes, we find no evidence of clustering of Sycon NK genes. All 35 Sycon Antp‐class genes are developmentally expressed, with patterns suggesting 36 involvement in cell type specification in embryos and adults, metamorphosis and 37 body plan patterning. The present study demonstrates that ParaHox genes 38 predate the origin of sponges, thus confirming the ghost locus hypothesis8, and 39 highlights the need to analyze genomes of multiple sponge lineages in order to 40 obtain a complete picture of the ancestral composition of the first animal genome. 41 Sponges (Porifera) are strong candidates for being the earliest extant lineage(s) of 42 animals9. The genome sequence of the demosponge Amphimedon queenslandica 43 provided rich material for comparative studies on the origins of metazoan 44 developmental genes, cell types and body plan1. Among others, it has fuelled hypotheses 45 about the origin of one of the most widely studied groups of developmental genes: the 46 Antennapedia (ANTP)class homeoboxes, including Hox, ParaHox and NK genes5,7,8,10,11. 47 Antp genes have been found in all animals and are involved in multiple developmental 48 processes, including body plan patterning and neurogenesis12. Hox, ParaHox and NK 49 genes are often found in clusters5,13, and in some animals their expression is temporally 50 or spatially correlated to their position within the cluster (this being known as 51 colinearity)12. The Amphimedon genome contains eight NK genes, but neither Hox nor 52 ParaHox genes are present5. Six NK genes are linked in a tight cluster, and their simple 53 embryonic and larval expression patterns are not consistent with colinearity5,6. Lack of 54 Hox and ParaHox genes in Amphimedon, and also in the ctenophore Mnemiopsis leidyi7, 55 has previously been interpreted as reflecting the ancestral condition, and gave rise to 56 the ParaHoxozoa hypothesis, in which all animal lineages apart from poriferans and 57 ctenophores are collectively known as the ParaHoxozoa. Others10 have interpreted the 58 phylogenetic evidence differently, suggesting that both Hox and ParaHox genes were 59 originally present in sponges, but have subsequently been lost. This view has been 60 recently revived by identification of Hox and ParaHox “ghost loci” (regions that display 61 synteny to bilaterian Hox and ParaHox loci, but lack the Hox/ParaHox genes themselves) 62 in the genome of Amphimedon8. 63 We expected that expanding the range of sequenced sponge genomes would provide 64 new information about the evolutionary history of genes important for the origin and 65 evolution of the animal kingdom. Calcisponges form a poriferan lineage which has been 66 separated from the demosponges for at least 600 million years9. We have recently 67 started analysis of the developmental toolkits of two calcisponges, Sycon ciliatum and 68 Leucosolenia complicata2,4,14. Here, we have searched for Antp‐class homeobox genes in 69 transcriptomic and genomic assemblies of these species. 70 We retrieved ten Antp‐class homeodomains in Sycon, constituting nine transcripts (one 71 with two homeoboxes), and twelve Antp‐class homeodomains in Leucosolenia, 72 constituting nine transcripts (one with four homeoboxes) (Supplementary dataset 1). 73 Our phylogenetic analyses demonstrate that the repertoire of Antp‐class genes is similar 74 between the two calcisponges, but strikingly different than in the demosponge 75 Amphimedon. Calcisponges and demosponges have clear orthologues of the bilaterian 76 genes Hex and Msx; calcisponges also have a clear NK5 (Hmx) orthologue, which seems 77 to be lacking in Amphimedon. In contrast, this demosponge has possible Bsh, BarH/BarI 78 and Tlx genes, which are not recognizable in calcisponges. While in Amphimedon there is 79 a single gene associated with the bilaterian NK2/3/4 clade15, several paralogues are 80 present in the two calcisponges. They contain multi‐homeobox genes with non‐ 81 orthologous relationships between Sycon and Leucosolenia, and other genes containing 82 single homeoboxes in the NK2/3/4 clade. Affiliation of Sycon and Leucosolenia NKB and 83 NKG and the LcoNKF genes with a particular bilaterian NK family is not clear. No Hox 84 genes were found; however, a pair of the calcisponge genes showing affinities with the 85 ParaHox Cdx subfamily given the concordance of Neighbour‐Joining (NJ) and Maximum 86 Likelihood (ML) analyses (Fig. 1, Extended Data Fig. 1). 87 Given the importance of this potential assignment, we performed further phylogenetic 88 analyses of these putative Cdx orthologues. In addition to the ELEKEF motif which is 89 shared by many Hox and ParaHox, but not NK‐type homeodomains, the Cdx family has 90 some distinctive residues in its homeodomain, most notably the YIT motif present only 91 in a small number of other ANTP class homeodomains (Supplementary note 1 and 92 Extended Data Fig. 2). Phylogenetic analyses focused on these few families in addition to 93 families represented in sponges, based on greater taxon sampling than in the overall 94 classification, produced a significantly supported clustering of SciCdx and LcoCdx with 95 Cdx genes from other species in NJ, ML and Bayesian analyses (Extended Data Figs. 3‐5). 96 We have also investigated the genomic neighbourhood of SciCdx to help resolve the 97 identity of this homeobox gene (Fig. 2; Supplementary note 2 and Supplementary Table 98 1). From the 14 genes on the SciCdx scaffold that have clear human orthologues 99 (Supplementary Table 1), four are orthologues of genes linked to ParaHox loci in 100 humans. One of these, SAR1A/B also has a conserved neighbouring relationship with the 101 ParaHox cluster in the cnidarian, Nematostella vectensis (Fig. 2a, b). Although these gene 102 numbers are insufficient to reach statistical significance, the neighbour relationships are 103 consistent with the identification of SciCdx as a ParaHox gene. Furthermore, as one 104 would expect from the ghost locus hypothesis and the identification of SciCdx as a bona 105 fide ParaHox gene, we also find clustering of Sycon orthologues of ParaHox and Hox 106 neighbour genes into two distinct groups in the Sycon genome to statistically significant 107 levels (Fig. 2c‐e). Altogether the evidence is consistent with the identification of SciCdx 108 and LcoCdx as the first examples of sponge ParaHox genes. 109 All of the Sycon NK genes are found on separate scaffolds (Extended data fig. 6) with 110 multiple additional genes surrounding the homeobox genes. We interpret this as the 111 ancient NK cluster having been broken apart in the Sycon genome. Alternatively, our 112 current assembly is not sufficient to provide evidence of a cluster with multiple genes 113 inserted between the NK genes. It has been previously shown that arrangements of NK 114 genes are variable between different species, ranging from intact and conserved NK 115 clusters5,16,17 to clusters that are partially broken15,18. 116 We studied the expression of Antp‐class genes in Sycon using a combination of in situ 117 hybridization with quantitative transcriptome analysis (Fig. 3, Supplementary note 3 118 and Extended Data Figs 7‐9). For all Antp‐class genes, except SciHex, expression can be 119 detected in oocytes and during cleavage (Fig. 3a and Extended Data Fig. 7a‐g). During 120 embryogenesis, the most striking expression domain of the majority of the identified 121 genes is the cruciform cells, which are putative larval sensory cells2,14. Beginning at the 122 four‐cell stage, stronger expression of SciNKA marks the cytoplasm destined to become 123 partitioned into the cruciform cells (Fig. 3d and Extended Data Fig. 7h‐q) and expression 124 of SciHmx is also markedly elevated in these cells (Fig. 3e). SciNKC and NKD are uniquely 125 and strongly expressed in the cruciform cells of more advanced (pre‐inversion stage) 126 embryos (Fig.

Calcisponges Have a Parahox Gene and Dynamic Expression of 2 Dispersed NK Homeobox Genes 3

Constraints on Genes Shape Long-Term Conservation of Macro-Synteny in Metazoan Genomes

University of Copenhagen, Zoological Museum, Review Universitetsparken 15, DK-2100 Copenhagen, Denmark CN, 0000-0001-6898-7655 Cite This Article: Nielsen C

About Synteny Analysis

“Parent-Daughter” Relationships Among Vertebrate Paralogs

Breakup of a Homeobox Cluster After Genome Duplication in Teleosts

Inferring Synteny Between Genome Assemblies: a Systematic Evaluation Dang Liu1,2, Martin Hunt3,4 and Isheng J Tsai1,2*

New Genomic Data and Analyses Challenge the Traditional Vision of Animal Epithelium Evolution

A Fast, General Synteny Detection Engine

Identifying Regions of Conserved Synteny Between

Cellular and Molecular Processes Leading to Embryo Formation In

The Evolution of Animal Genomes

Comparative Physical Mapping Links Conservation of Microsynteny to Chromosome Structure and Recombination in Grasses