*Revised Manuscript (unmarked)

1

EVOLUTIONARY HISTORY OF THE PODOPLANIN §

Jaime Renart1*, Diego San Mauro2, Ainhoa Agorreta2, Kim Rutherford3, Neil J.

Gemmell3, Miguel Quintanilla1

1Instituto de Investigaciones Biomédicas Alberto Sols, Consejo Superior de

Investigaciones Científicas (CSIC)-Universidad Autónoma de Madrid. Spain

2Department of Biodiversity, Ecology, and Evolution. Faculty of Biological Sciences.

Universidad Complutense de Madrid. 28040 Madrid. Spain

3Department of Anatomy, School of Biomedical Sciences. University of Otago, PO Box

56, Dunedin 9054, New Zealand

*Corresponding author:

Jaime Renart

Instituto de Investigaciones Biomédicas Alberto Sols, CSIC-UAM

Arturo Duperier 4. 28029-Madrid. Spain.

T: +34 915854412

[email protected]

§We wish to dedicate this publication to the memory of our friend and colleague Luis

Álvarez (†2016)

2

Keywords: PDPN, Evolution, Gnathostomes, exon/intron gain

Abbreviations: BLAST, Basic Local Alignment Search Tool; CT, cytoplasmic domain;

EC, extracellular domain; NCBI, National Center for Biotechnology Information;

PDPN, podoplanin; SRA, Sequence Read Archive; TAE, Tris Acetate-EDTA buffer;

PCR, polymerase chain reaction; UTR, untranslated region

3

ABSTRACT

Podoplanin is a type I small -like involved in cell motility. We have identified and studied the podoplanin coding sequence in 201 of vertebrates, ranging from cartilaginous fishes to mammals. The N-terminal signal peptide is coded by the first exon; the transmembrane and intracellular domains are coded by the third exon (except for the last amino acid, coded in another exon with a long 3’-UTR). The extracellular domain has undergone variation during evolutionary time, having a single exon in cartilaginous fishes, teleosts, coelacanths and lungfishes. In , this single exon has been split in two, and in amniotes, another exon has been acquired, although it has been secondarily lost in Squamata. The podoplanin evolutionary tree agrees with the species tree in most cases. Two functional elements in the , the

GXXXG dimerization motif in the transmembrane domain and the region where podoplanin is cleaved by -secretase are subjected to positive selection.

4

1. Introduction

Podoplanin (PDPN) is a small mucin-like type I transmembrane protein present in numerous tissues and often over-expressed in tumors, generally with poor prognosis (Renart et al. (2015); Ugorski et al. (2016) and references therein). Although podoplanin lacks any recognizable catalytic activity, the two domains of the protein, the

N-terminus extracellular (EC) and the C-terminus (CT) intracellular, play important roles in cellular physiology by protein-protein interactions (see Fig. 1 for the structure and sequence of the protein, as well as the origin of the different N-terminus described in section 2.2). The small (9 amino acids long) cytoplasmic C-terminus is able to interact with members of the ERM (ezrin, radixin and moesin) protein family. This affects cytoskeleton dynamics (increasing cell motility) and the formation or stabilization of different membrane structures, like filopodia, invadopodia or ruffles

(Martín-Villar et al., 2006; Martin-Villar et al., 2010; Martin-Villar et al., 2015). The extracellular domain of the protein is heavily O-glycosylated and interacts with a number of different proteins, like the C-type lectin receptor CLEC1B (present in platelets, endothelial lymphatic and dendritic cells), CD44, CD9, galectin8, CCL21 or

HspA9 (see related references in Renart et al., 2015). In particular, the interaction with

CLEC1B has been shown to be important in lymphangiogenesis, the correct functioning of the immune system, and metastasis. On the other hand, the interaction with CD44, has been shown to affect cell migration (Martin-Villar et al., 2010). The interactions with the other mentioned partners are less well studied, although CD9 could compete

CLEC1B-PDPN interaction (Nakazawa et al., 2008), and therefore potentially suppress tumor metastasis. 5

The crosstalk between the two domains of podoplanin is not well understood. We have shown that mutations in the trans-membrane (TM) domain affect its location within different membrane topological domains and its function as a promoter of epithelial to mesenchymal transition (Fernandez-Muñoz et al., 2011). This finding suggests that podoplanin location could affect interactions of the CT domain.

Podoplanin is a protein coded by a unique gene that is present only in jawed vertebrates (our search for PDPN coding sequences has been unable to find them in non-vertebrate lineages as well as in jawless fishes within vertebrates). The sequence of the protein, as will be discussed below, poses interesting questions on the evolution of podoplanin, as the apparent lack of connection between the extracellular- and intracellular-domains functions. Moreover, the evolution of the podoplanin gene shows acquisition of introns at specific points during the emergence of tetrapods

(amphibians plus amniotes), as well as acquisition and loss of an exon in amniotes and lepidosaurians, respectively.

2. Materials and methods

2.1. Ethics statement

A 1 ml blood sample was obtained from a large male tuatara (Sphenodon punctatus) from Lady Alice Island, New Zealand in November 2011 by Lindsay

Mickelson of Victoria University of Wellington. Sampling was conducted with ethical approval from VUW, the permission of Ngatiwai iwi, and was permitted by the New

Zealand Department of Conservation *NO-32037-RES). Muscle tissue was obtained from a specimen of Protopterus sp. purchased to a commercial pet dealer and sacrificed by snap freezing in liquid nitrogen. 6

2.2. Preliminary conventions

Most of the data presented here belong to non-model organisms for which genomic or transcriptomic sequences are not always complete and/or correctly annotated; in the majority of cases, protein sequences are bioinformatically predicted (in silico), based on similarity with orthologues. This is particularly important in the definition of 5’- and 3’-UTR regions. The mammalian podoplanin gene (studied mostly in human, mouse and rat) has a variable transcription start site, typical of TATA-less promoters with CpG islands (Antequera, 2003; Hantusch et al., 2007; Martin-Villar et al., 2009). Several reports in the databases show a peptide starting 76 amino acids before the canonical start site shown in Fig. 1A. As has already been discussed by our group (Martin-Villar et al., 2005), it is unlikely that this form is expressed in cells, as it lacks a signal peptide at the N-terminus. Similar reasoning can be applied to the 3’-

UTR. To eliminate these uncertainties, podoplanin gene is defined here from the ATG start to the stop codon.

The variability in exon number (that we discuss in section 3.3 below) occurs exclusively in the extracellular domain that is coded only by one exon in fishes, and three exons in mammals. Therefore, we have defined them as 2a, 2b and 2c (as shown in Fig. 1) to maintain the numbering of the exons constant in all remaining species

(exon1 for the signal peptide, exon3 for the transmembrane domain plus intracellular domain, and exon4 for the Pro-stop plus the 3’-UTR).

The podoplanin sequences are not clearly defined in the C-terminus. When the podoplanin sequence is derived from transcriptomic experiments, the final amino acids are GRYSP-stop or GRP-stop. However, when the sequence is derived from genomic sequences, we get GRYS-stop very often. As shown in Fig. 1C, the boundary between 7 exons 3 and 4 is located between the second and third bases of the Ser-157 codon

(referred to the human sequence). Because it is very common that the bases after de GT donor site are A, gene prediction algorithms read Ser and stop (boxed in the figure).

Also shown in the figure is an alternative GT donor site 6 bp upstream, thus eliminating the Tyr-Ser codons and reading the GRP-stop isoform. We have previously studied the occurrence of this alternative splicing, which occurs around 30% in human cells, and we did not find any correlation with tissue distribution or normal/malignant type of cells

(Martin-Villar et al., 2009).

2.3. Data retrieval

The species that we have studied are listed in Supplementary Table 1 with the

NCBI taxid ID and supra-generic taxonomic assignment. They all are jawed vertebrates, as no homologous sequences were found in non-vertebrate lineages, as well as in jawless fishes. The source of the sequences (database accession numbers or other) is also indicated. In the case of genomic contigs or SRA raw data, podoplanin sequences were identified using BLAST searches (Altschul et al., 1990), either locally or remotely at the NCBI server, with query sequences as phylogenetically close as possible to the species of interest, and final assembly performed with the Geneious-R10.1.2 program

(http://geneious.com, Kearse et al. (2012)). The sequence of Scaffold ScrUdWx_2583, that contains the podoplanin gene from Sphenodon punctatus was made available, prior to publication, by the Tuatara Genome Consortium.

2.4. Phylogeny reconstruction

Sequences were remotely aligned with MAFFT v-7.3.10 (Katoh and Standley,

2013) using the E-INS-I iterative refinement method. Alignments were manually 8 curated to eliminate degenerate nucleotides and eventually trimmed with Trimal

(Capella-Gutierrez et al., 2009) with the gappyout option, through the ETE3 framework

(Huerta-Cepas et al., 2016). Maximum likelihood (ML) analysis was performed with

RAxML v-8.2.10 (Stamatakis, 2014) through the CIPRES Science Gateway (Miller et al., 2010), with GTRCATI substitution model and computing 1000 bootstrap replicates.

Bayesian inference (BI) analysis (Ronquist et al., 2012) was conducted locally with

MrBayes v-3.2.6, running 4 chains, 107 generations with a sample frequency of 1000 and discarding the first 25% of samples as burn-in, with the GTR+G+I substitution model, selected following the BIC criterion in PartitionFinder v-1.1.1. Podoplanin sequences of cartilaginous fishes were used as outgroup in both ML and BI analyses. dN/dS (ω ratios) were computed with the CODEML program of the PAML v-4.9e package (Yang, 2007). Nucleotide alignments were generated with TranslatorX

(Abascal et al., 2010) to maintain the codon structure. Unrooted trees were performed with RAxML as before. CODEML was run at the CSIC computing facilities (AMD and

Intel, 48 GB node with RedHat 7.3 architecture x86_64 and 39 TB of disk space) or at an AMB-Opteron 6276 machine with 264 GB of RAM, 1 TB of disk space), running

Debian (buster) GNU Linux. Models used are described in the text and in Table 1. P- values were calculated in the RPKP web site

(http://www.fourmilab.ch/rpkp/experiments/analysis/chiCalc.html)

2.5. PCR amplification of Protopterus DNA

To determine if the lungfishes podoplanin gene has a single exon2, we assumed that exon1 would correspond exactly to the signal peptide, as it is the case in all other species studied, and exon3 could start some 8-10 codons before the GXXXG motif, as it is also the case in almost all other species. Interestingly, Neoceratodus forsteri does have the GXXXG motif, whereas Protopterus. annectens, P. dolloi and Lepidosiren 9 paradoxa do not. On the bases of these hypothesis, we designed two deoxyoligonucleotides that could amplify exon 2. Protopterus sp.: 5’-

CAAAACATCTCAGGTGATACAAATGAAAAACAGG (forward) and 5’-

CAATTATGACAGCAGTCTTTGTAACATCTTCTGC (reverse) with Tth DNA polymerase (BioTools) and using the following reaction conditions: 1 M deoxyoligonucleotides, 0.2 mM dNTPs, 360 ng Protopterus DNA in 50 l final reaction volume. Amplification conditions were: 94ºC for 3 min, 30 cycles of 94ºC for 30 s,

55ºC for 30 s, 72ºC for 90 s, and, finally, 72ºC for 7 min. An aliquot of the PCR product was electrophoresed in 2% agarose gel in TAE buffer. The rest of the reaction was cleaned with NZYGelpure columns (NZYTech) and ligated to 50 ng of pGEM-T easy vector (Promega) in 10 l reaction volume. The ligation reaction was transformed in

DH5 competent cells. Several colonies were amplified with the M13-D and M13-R universal primers using the same PCR conditions as before and sequenced in the

Genomics Facility of the Instituto de Investigaciones Biomédicas in an automated DNA sequencer ABI PRISM 3730-XL (Applied Biosystems) using the BigDye v3.1 Cycle sequencing kit (Applied Biosystems) and M13-D primer.

2.6. Completion of the Sphenodon punctatus coding sequence

Contig ScrUdWx_2583 contained exons 1, 2a and 2b, but no sequence complementary to exon3 could be detected. A partial sequence of exon3 was found in the partial transcriptome reported by Miller et al. (2012). As an assembly gap was present at around 14 kb downstream from exon2b, we hypothesized that it was probable that exon3 would be somewhere within this gap. We designed two primers, 1-

INTRON3-R (5’-GTCTAGTTTATTCCCCTGG), at the 3’ of the gap boundary and 3- 10

EXON3-R (5’-GACAGCATTGAAACAGCAA), spanning the first bases of exon3 found in the Miller transcriptome analysis. Similarly, we designed primers 2-EXON3-R

(5’-TGCAGTAGCAATTCCAATC) and 4-INTRON3-F (5’-

GCGAGCCAAATCAACAACC) located at the 5`of the gap boundary and the last known bases of exon3 in Miller’s transcriptome (see Supplementary Figure1 for details). PCR was conducted using the long-range Roche Expand Long Range dNTPack, according to manufacturer instructions with 300-500 ng S. punctatus DNA and the following touchdown protocol: 2 min at 92ºC,

10 cycles of 10 s at 92ºC, 15 s at 59ºC-50ºC (-1ºC per cycle), 8 min at 68ºC, 25 cycles of 10 s at 92ºC, 15 s at 50ºC, 8 min at 68ºC, followed by 7 min at 68ºC. Products were purified using a Pall Acroprep96 30kDa filter plate, quantified and sequenced on a

3730-XL DNA Analyzer at the Genetic Analysis Services at Otago University. As shown in Supplementary Figure 1, PCR with primers 1+3 gave a fragment of 830 bp with the same sequence in three different S. punctatus individuals from three different locations in New Zealand. PCR using primers 2+4 did not amplify any band, suggesting that they bind more than 5 kb away from exon3 to the gap boundary.

3. Results

3.1. Phylogeny of the podoplanin coding sequence

Figure 2 shows the phylogenetic tree obtained using the alignment of nucleotide sequences of podoplanin without trimming. The tree obtained with the trimmed alignment had the same topology, with minor differences in branch support values (not shown). This phylogeny was constructed using representatives of the major lineages of 11 extant vertebrates. The agreement between the species tree (Hedges and Kumar, (2009);

Vargas and Zardoya, (2014)) and the gene tree is remarkably high, with some exceptions (all of them with low support): 1) the relative phylogenetic position of bony fishes and lungfishes is likely spurious; 2) in amphibians, Caudata is recovered as sister to , instead of to Anura; 3) in primates, there are non-canonical groupings, like with gorillas, instead of chimpanzees.

Although the cytoplasmic domain is extremely well conserved across most of the vertebrate species included in the study, there are nonetheless some exceptions. Such is the case of the clade “glires” that comprises both lagomorphs and rodents (Fig. 3).

While lagomorphs present the canonical amino acid sequence, represented by

RKMSGRYSP, rodents present different changes to such basic structure; Sciuromorphs present a consistent Ser (S) to Gly (G) change, whereas in Castorimorpha and

Myomorpha is the Tyr (Y) that systematicly change to a Phe (F). These latter groups also present a variable third position, where Met (M) is transformed to either to an Iso

(I) or to a Val (V). The most radical change occurs in Hystricomorpha, where the CT domain is truncated but retains only the basic amino acids involved in cell motility

(Martín-Villar et al., 2006). The functional consequences of this change are not known, but interestingly, hystricomorphs are now being studied for its resistance to cancer

(Gorbunova et al., 2014).

In the case of (Gymnophiona), the sister group of the other two orders of extant amphibians, podoplanin also shows distinct variation that is not found in the other studied vertebrate groups. Fig. 4A shows a diagram of the different podoplanin isoforms found in caecilians. tentaculata and unicolor have an insertion in exon2a; while C. tentaculata present isoforms with and without the insertion, M. unicolor only presents isoforms without the insertion. In the case of C. 12 tentaculata, M. dermatophaga and Typhlonectes compressicauda, the end of the sequence shows heterogeneity likely due to the presence of an alternative exon4. Fig.

4B shows the sequence of the alternative forms in the cytoplasmic tail. All short CT have an extra Tyr (Y) before the stop codon, which could easily be obtained by a transversion. The large CT could be due to alternative splicing. Given that reference genomes are not yet available for Gymnophiona, these changes cannot be further studied.

3.2. Selective pressures on the podoplanin coding sequence

We have used the codeml program in the PAML package (Yang, 2007) to study what kind of evolutionary selection forces have acted on the podoplanin coding sequence, studying the ratio of non-synonymous vs synonymous substitutions

(=dN/dS). Codeml was run with the cleandata option activated, due to the diversity of the sequences in the different species. The results of several tests are shown in Table 1.

Site models (M2a vs. M1a and M8 vs. M7) clearly show that there are sites with positive selection, with a high significance level (p<0.01). The same can be said of the free-ratio model, which indicates that there are different lineages with >1 ratios.

In order to ascertain whether those species that gave >1 ratios in the free-ratio model were indeed subjected to positive selection, we additionally run branch-site tests, which assume that only some sites of one or more selected branches (foreground branches) on the phylogeny could have suffered positive selection. The specific terminal branches of each of those species that gave >1 ratios in the free-ratio model were used as foreground branches. Table 2 shows the results of these branch-site tests for those species with . 13

3.3. Evolution of the structure of the podoplanin gene

As mentioned above, the extracellular domain of podoplanin is entirely coded in the single exon2 in cartilaginous fishes (Elasmobranchii), bony fishes (Actinopterygii), and coelacanths and lungfishes (Sarcopterygii; Fig. 5). As lungfishes are the sister clade of tetrapoda (Irisarri and Meyer, 2016), which at least have two exons coding for the exracellular domain, we carried out a PCR experiment to ascertain this point. As shown in Fig. 6A, we were able to amplify a single band of the expected size. The sequence of this band showed only minor differences with respect to the sequence obtained from the databases (Fig. 6B). We show the experimental sequences aligned versus P. dolloi because there is an ATA codon (boxed) that is present in P. dolloi but not in P. annectens, and it was also present in the experimental sequences.

Amniotes, but not Squamata, have another exon, 2c, coding for 13 amino acids, in the extracellular domain. As mentioned above, S. punctatus (order Rhynchocephalia) is the sister taxon of extant Squamata, having diverged about 250 Myr ago (Jones et al.,

2013). To check if S. punctatus has a squamata-like (exon2c absent) or amniote-like

(exon2c present) podoplanin we performed the experiment shown in Supplementary

Fig. 1. Although we were not able to complete the sequence between exon2b and exon3

(see Materials and Methods section 2.6 and Supplementary Fig. 1), the partial transcriptome obtained by Miller et al (2012) does contain reads that cover the exon2b and exon3 junction. Therefore we assume that S. punctatus has a squamata-like gene

(i.e. exon2c absent).

3.4. Synteny of the podoplanin gene

In humans, the podoplanin gene is located in 1p36.21 (coordinates

13583757-13617957 in GRCh38.p7). PDPN is surrounded by LRRC38 (at the 5’ end) and PRDM2 and KAZN (at the 3’ end). The LRRC38 product, leucine rich repeat 14 containing protein 38 is the  auxiliary subunit of the high conductance potassium BK channel. PRDM2 is PR domain-containing protein 2, a H3K9 methylase. Finally, the product coded by KAZN, kazrin, is a periplakin interactin protein. This organization is maintained in almost all taxa studied, as shown in Fig. 7. The cluster of KAZN-PRDM2-

PDPN in one orientation, adjacent to LRRC38, in the opposite orientation, is maintained from cartilaginous fishes to mammals, irrespective of the differences in gene size or intergenic distances.

4. Discussion

4.1. Phylogeny of the podoplanin coding sequence

We report here a comprehensive phylogeny of the podoplanin gene, which is exclusively present in jawed vertebrates. All our attempts to find podoplanin orthologues in cyclostomes (jawless fishes) and non-vertebrate have been so far unsuccessful. It is remarkable, however, that the basic structure of the gene has remained almost unchanged since the emergence of jawed fishes, around 500 Myr

(Hedges, 2009). In this regard, the transmembrane plus cytoplasmic domains, which together are encoded by exons 3 and 4, are extremely well conserved. The transmembrane domain shows a characteristic GXXXG motif close to the extracellular side (which has been lost in the African and South American species of lungfishes).

Although with the already mentioned exceptions, the 9 amino acid long cytoplasmic domain is also conserven in the majority of the sequences studied.

It is also interesting to note that podoplanin has remained a single-copy gene in all vertebrate groups, even in bony fishes (Actinopterygii), which have undergone an extra genome duplication (Meyer and Van de Peer, 2005). In mouse models, absence of 15 podoplanin results in perinatal lethality (Ramirez et al., 2003; Mahtab et al., 2008).

Although no experiments have been reported about the effects of the presence of extra copies of podoplanin, indirect results could be consistent with deleterious effects of extra podoplanin copies during development. Murtomaki et al. (2013) have shown that reduced levels of Notch1 resulted in increased levels of Prox1-positive lymphatic endothelial progenitor cells, leading to deficient separation of lymphatic and venous vessels. As Prox1 is the master regulator of lymphatic endothelial cells, and the podoplanin promoter has sites for this transcription factor (Pan et al., 2014), it could be possible that higher levels of podoplanin could also affect the separation of lymphatic and venous vessels.

The two topological domains of podoplanin (extracellular and intracellular) seem to have evolved more or less independently, generating more diversity in the extracellular domain. Both domains are involved, in principle, in different cellular functions. The extracellular domain is involved in the maintenance and development of the lymphatic vasculature, in platelet aggregation and in T lymphocytes and dendritic cell interaction (Astarita et al., 2015). In contrast, the cytoplasmic domain is involved in cell motility through its interaction with the ERM proteins. Two main interacting proteins have been described in relation with the extracellular domain: the chemokine

CCL21 (Comerford et al., 2013) and the C-type lectin domain receptor CLEC1B

(Suzuki-Inoue et al., 2007). The mechanism of CCL21 interaction is not known.

CLEC1B has been crystalized in the presence of podoplanin peptides (Nagae et al.,

2014). The interaction takes place with the two first amino acids of the PLAG3 domain

(Glu-Asp, 47 and 48 in humans) and the sialic acid residue linked to the GalNAc residue in Thr-52 . Conversely, four arginine residues (107, 118, 152, 157) are responsible for this interaction in CLEC1B. A generalized PLAG domain (acid-acid- 16

XX-T/S) is found in almost all podoplanin molecules studied. However, it is more difficult to detect the three arginine residues in orthologues of CLEC1B. This molecule has been described in chicken (Neulen and Gobel, 2012), but a structural model based on the human protein showed no similarly positioned arginine residues that could interact with the acidic ones in podoplanin (not shown). The evolutionary analysis of this protein is nonetheless outside the scope of this study.

With respect to the cytoplasmic domain, the structural requirements for the interaction with ERM proteins has been extensively studied by our group (Martín-Villar et al., 2006). The three basic residues present in this domain constitute a general ERM binding motif present in several proteins, like ICAM1-3, CD43 and CD44 (Renart et al.,

2015). The FERM domain of ERM proteins, involved in the binding of membrane proteins probably predates the vertebrate radiation (Golovnina et al., 2005).

4.2. Selective pressures on the podoplanin coding sequence

The codeml program was run with the cleandata option activated, so most of the extracellular domain was eliminated, because it is the most variable within species. This is a general feature of membrane proteins (Sojo et al., 2016). The branch-site models

(A1 and A) indicate that there are some sites with significant (>95%) posterior probabilities for being selected. The numbering of the residues shown in Table 2 are referenced to the human sequence. Some of the residues can be correlated with functional sites. For instance, G-22 is the last residue of the signal peptide, where the peptidase should cut. G133 and G137 are the outermost residues of the homodimerization GXXXG motif, necessary for the proper function of podoplanin

(Fernandez-Muñoz et al., 2011), seems to be under strong purifying selection. The same 17 can be said of the of I-148 and V-152, located in close proximity to the -secretase cleavage site (Yurrita et al., 2014).

4.3. Structure of the podoplanin gene

Regarding the structure of the podoplanin gene, our data demonstrate that lungfishes have only one exon corresponding to the extracellular domain, so the insertion of the intron took place with the advent of tetrapods, at around 350 Myr ago

(Irisarri and Meyer, 2016) . In this regard, it is interesting to note that intron 2a in the podoplanin gene of all tetrapods is of phase 0. This is expected if an insertion event has to maintain the reading frame of the receptor gene. Furthermore, the insertion of an intron in the primitive exon2 is in agreement with the general random fragmentation process of Wang and Stein (2013), which predicts that introns are inserted mainly at the center of exons; cartilaginous fishes and lungfishes have an exon2 of around 68 amino acids, and the mean length of exons 2a and 2b for amphibians are of 30 and 41 amino acids, respectively. The scarcity of annotated genomic sequences for representatives precludes us addressing the origin of this insertion in more detail.

It is remarkable that Lepidosauria, that comprises S. punctatus and all the

Squamata have only two exons coding for the extracellular domain, instead of three, the normal situation in amniotes. The most parsimonious explanation for this situation is that amniotes acquired such new exon (probably by exonization of intronic sequences,

(Sorek, 2007), and that it was subsequently lost in lepidosaurians. Again, there is no sufficient genomic data available yet to further look for remnants of this exon in lepidosaurian intron2b.

18

4.4. Synteny of the podoplanin gene

Podoplanin gene is unique in all the species that we have studied. As shown in

Fig. 7, synteny around the podoplanin gene is very well conserved in all jawed vertebrates. In some cases, some of the neighbor have moved away. For instance, in the western painted turtle, C. picta , KAZN gene is at the end of chromosome 24, whereas the other three, LRRC38, PRDM2 and PDPN are in a more central part of the same chromosome (not shown). The extra whole genome duplication that took place at the radiation of the ray-finned fishes (Meyer and Van de Peer, 2005) makes the topology in this group more complicated, although podoplanin gene continues to be unique. For instance, in the zebra fish, the triad kazn-a, , prdm2-a is in chromosome 8, whereas kazn-b and prdm2-b are found in chromosome 11. Another example is the Nile tilapia, Oreochromis niloticus, where kazn-prdm2-pdpn are found in contig NC_031984.1, but another copy of kazn and prdm2 are found in contig

NC_031970.

In conclusion, we report here the detailed phylogeny of a type I transmembrane protein, present as a single copy gene in all jawed vertebrates (absent in jawless vertebrates and invertebrates). The general structure of the podoplanin gene is maintained in all species, but we have detected intron gains in tetrapods and amniotes and an exon loss in lepidosaurians. The podoplanin coding sequence is also generally conserved, although we find variation in the intracellular domain of amphibians and rodents.

Acknowledgments 19

JR and MQ were supported by grant SAF2013-46183-R from the Ministerio de

Economía y Competitividad, Spain. AA and DSM were financed by grant CGL2012-

40082, and DSM also supported by grant RYC-2011-09321, both from the Ministerio de Economía y Competitividad, Spain. Ngatiwai iwi and the New Zealand Department of Conservation permitted tuatara sampling, which was undertaken on our behalf by L.

Mickelson (Victoria University of Wellington). The Allan Wilson Centre and the

University of Otago funded the tuatara genome project that contributed sequence data to this work. We thank Mrs Joanne Gillum for her expert laboratory support in helping complete the tuatara gene sequence. We also acknowledge Dr. R. Diaz-Uriarte

(Institutuo de Investigaciones Biomédicas Alberto Sols, CSIC-UAM) for access to the

AMD-Opteron machine, purchased with funds from project BIO2009-12458 from the

Spanish MINECO, and Dr. Luis del Peso, from the same Institute for helpful discussiones

Bibliography

Abascal, F., Zardoya, R. and Telford, M.J., 2010. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res 38, W7-13. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J., 1990. Basic local alignment search tool. J Mol Biol 215, 403-10. Antequera, F., 2003. Structure, function and evolution of CpG island promoters. Cell Mol Life Sci 60, 1647-58. Astarita, J.L., Cremasco, V., Fu, J., Darnell, M.C., Peck, J.R., Nieves-Bonilla, J.M., Song, K., Kondo, Y., Woodruff, M.C., Gogineni, A., Onder, L., Ludewig, B., Weimer, R.M., Carroll, M.C., Mooney, D.J., Xia, L. and Turley, S.J., 2015. The CLEC-2-podoplanin axis controls the contractility of fibroblastic reticular cells and lymph node microarchitecture. Nat Immunol 16, 75-84. 20

Capella-Gutierrez, S., Silla-Martinez, J.M. and Gabaldon, T., 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-3. Comerford, I., Harata-Lee, Y., Bunting, M.D., Gregor, C., Kara, E.E. and McColl, S.R., 2013. A myriad of functions and complex regulation of the CCR7/CCL19/CCL21 chemokine axis in the adaptive immune system. Cytokine Growth Factor Rev 24, 269-83. Fernandez-Muñoz, B., Yurrita, M.M., Martin-Villar, E., Carrasco-Ramirez, P., Megias, D., Renart, J. and Quintanilla, M., 2011. The transmembrane domain of podoplanin is required for its association with lipid rafts and the induction of epithelial-mesenchymal transition. Int J Biochem Cell Biol 43, 886-96. Golovnina, K., Blinov, A., Akhmametyeva, E.M., Omelyanchuk, L.V. and Chang, L.- S., 2005. Evolution and origin of merlin, the product of the Neurofibromatosis type 2 (NF2) tumor-suppressor gene. BMC Evolutionary Biology 5, 69. Gorbunova, V., Seluanov, A., Zhang, Z., Gladyshev, V.N. and Vijg, J., 2014. Comparative genetics of longevity and cancer: insights from long-lived rodents. Nat Rev Genet 15, 531-40. Hantusch, B., Kalt, R., Krieger, S., Puri, C. and Kerjaschki, D., 2007. Sp1/Sp3 and DNA-methylation contribute to basal transcriptional activation of human podoplanin in MG63 versus Saos-2 osteoblastic cells. BMC Mol Biol 8, 20. Hedges, S., 2009. Vertebrates (Vertebrata). in: Hedges, S. and Kumar, S. (Eds.), The Timetree of Life. Oxford University Press, New York, pp. 309-314. Hedges, S.B. and Kumar, S., 2009. The Timetree of Life, Oxford University Press, New York. Huerta-Cepas, J., Serra, F. and Bork, P., 2016. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Mol Biol Evol 33, 1635-8. Irisarri, I. and Meyer, A., 2016. The Identification of the Closest Living Relative(s) of Tetrapods: Phylogenomic Lessons for Resolving Short Ancient Internodes. Syst Biol 65, 1057-1075. Jones, M.E., Anderson, C.L., Hipsley, C.A., Muller, J., Evans, S.E. and Schoch, R.R., 2013. Integration of molecules and new fossils supports a Triassic origin for Lepidosauria (lizards, snakes, and tuatara). BMC Evol Biol 13, 208. Kato, Y. and Kaneko, M.K., 2014. A cancer-specific monoclonal antibody recognizes the aberrantly glycosylated podoplanin. Sci Rep 4, 5924. 21

Katoh, K. and Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30, 772- 80. Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P. and Drummond, A., 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647-9. Mahtab, E.A., Wijffels, M.C., Van Den Akker, N.M., Hahurij, N.D., Lie-Venema, H., Wisse, L.J., Deruiter, M.C., Uhrin, P., Zaujec, J., Binder, B.R., Schalij, M.J., Poelmann, R.E. and Gittenberger-De Groot, A.C., 2008. Cardiac malformations and myocardial abnormalities in podoplanin knockout mouse embryos: Correlation with abnormal epicardial development. Dev Dyn 237, 847-57. Martin-Villar, E., Borda-d'Agua, B., Carrasco-Ramirez, P., Renart, J., Parsons, M., Quintanilla, M. and Jones, G.E., 2015. Podoplanin mediates ECM degradation by squamous carcinoma cells through control of invadopodia stability. Oncogene 34, 4531-44. Martin-Villar, E., Fernandez-Munoz, B., Parsons, M., Yurrita, M.M., Megias, D., Perez- Gomez, E., Jones, G.E. and Quintanilla, M., 2010. Podoplanin associates with CD44 to promote directional cell migration. Mol Biol Cell 21, 4387-99. Martín-Villar, E., Megias, D., Castel, S., Yurrita, M.M., Vilaro, S. and Quintanilla, M., 2006. Podoplanin binds ERM proteins to activate RhoA and promote epithelial- mesenchymal transition. J Cell Sci 119, 4541-4553. Martin-Villar, E., Scholl, F.G., Gamallo, C., Yurrita, M.M., Munoz-Guerra, M., Cruces, J. and Quintanilla, M., 2005. Characterization of human PA2.26 antigen (T1alpha-2, podoplanin), a small membrane mucin induced in oral squamous cell carcinomas. Int J Cancer 113, 899-910. Martin-Villar, E., Yurrita, M.M., Fernandez-Munoz, B., Quintanilla, M. and Renart, J., 2009. Regulation of podoplanin/PA2.26 antigen expression in tumour cells. Involvement of calpain-mediated proteolysis. Int J Biochem Cell Biol 41, 1421- 9. Meyer, A. and Van de Peer, Y., 2005. From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). BioEssays 27, 937-945. 22

Miller, H.C., Biggs, P.J., Voelckel, C. and Nelson, N.J., 2012. De novo sequence assembly and characterisation of a partial transcriptome for an evolutionarily distinct reptile, the tuatara (Sphenodon punctatus). BMC Genomics 13, 439. Miller, M., Pfeiffer, W. and Schwartz, T., 2010. Creating the CIPRES Science Gateway for Inference of Large Phylogenetic Trees, Gateway Computing Environments Workshop (GCE). New Orleans, LA, pp. 1-8. Murtomaki, A., Uh, M.K., Choi, Y.K., Kitajewski, C., Borisenko, V., Kitajewski, J. and Shawber, C.J., 2013. Notch1 functions as a negative regulator of lymphatic endothelial cell differentiation in the venous endothelium. Development 140, 2365-76. Nagae, M., Morita-Matsumoto, K., Kato, M., Kaneko, M.K., Kato, Y. and Yamaguchi, Y., 2014. A Platform of C-Type Lectin-like Receptor CLEC-2 for Binding O- Glycosylated Podoplanin and Nonglycosylated Rhodocytin. Structure 22, 1711- 1721. Nakazawa, Y., Sato, S., Naito, M., Kato, Y., Mishima, K., Arai, H., Tsuruo, T. and Fujita, N., 2008. Tetraspanin family member CD9 inhibits Aggrus/podoplanin- induced platelet aggregation and suppresses pulmonary metastasis. Blood 112, 1730-9. Neulen, M.L. and Gobel, T.W., 2012. Identification of a chicken CLEC-2 homologue, an activating C-type lectin expressed by thrombocytes. Immunogenetics 64, 389-97. Pan, Y., Wang, W.D. and Yago, T., 2014. Transcriptional regulation of podoplanin expression by Prox1 in lymphatic endothelial cells. Microvasc Res 94C, 96-102. Ramirez, M.I., Millien, G., Hinds, A., Cao, Y., Seldin, D.C. and Williams, M.C., 2003. T1alpha, a lung type I cell differentiation gene, is required for normal lung cell proliferation and alveolus formation at birth. Dev Biol 256, 61-72. Renart, J., Carrasco-Ramirez, P., Fernandez-Munoz, B., Martin-Villar, E., Montero, L., Yurrita, M.M. and Quintanilla, M., 2015. New insights into the role of podoplanin in epithelial-mesenchymal transition. Int Rev Cell Mol Biol 317, 185-239. Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Hohna, S., Larget, B., Liu, L., Suchard, M.A. and Huelsenbeck, J.P., 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61, 539-42. 23

Sojo, V., Dessimoz, C., Pomiankowski, A. and Lane, N., 2016. Membrane Proteins Are Dramatically Less Conserved than Water-Soluble Proteins across the Tree of Life. Mol Biol Evol 33, 2874-2884. Sorek, R., 2007. The birth of new exons: mechanisms and evolutionary consequences. RNA 13, 1603-8. Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30, 1312-3. Suzuki-Inoue, K., Kato, Y., Inoue, O., Kaneko, M.K., Mishima, K., Yatomi, Y., Yamazaki, Y., Narimatsu, H. and Ozaki, Y., 2007. Involvement of the snake toxin receptor CLEC-2, in podoplanin-mediated platelet activation, by cancer cells. J Biol Chem 282, 25993-6001. Ugorski, M., Dziegiel, P. and Suchanski, J., 2016. Podoplanin - a small glycoprotein with many faces. Am J Cancer Res 6, 370-86. Vargas, P. and Zardoya, R., 2014. The Tree of Life, Sinauer Associates Inc., Sunderland, US. Wang, L. and Stein, L.D., 2013. Modeling the evolution dynamics of exon-intron structure with a general random fragmentation process. BMC Evolutionary Biology 13, 57. Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586-91. Yurrita, M.M., Fernandez-Munoz, B., Del Castillo, G., Martin-Villar, E., Renart, J. and Quintanilla, M., 2014. Podoplanin is a substrate of presenilin-1/gamma- secretase. Int J Biochem Cell Biol 46, 68-75.

LEGENDS TO THE FIGURES

Fig. 1. Structure of the podoplanin gene in humans. A. Diagram showing the different domains of podoplanin (upper part) and the exon structure (lower part). Introns are variable in length within species and are not represented in the diagram. SP, signal peptide; EC, extracellular domain; TM, transmembrane domain; CT cytoplasmic domain. Numbers refer to the amino acid residue position in the human protein. B.

Amino acid sequence of the human podoplanin. Alternating amino acid colors indicate 24 different exons. Signal peptide is shown with a dashed box. Platelet aggregation- stimulating (PLAG) domains are boxed with continuous lines. The cytoplasmic domain is underlined. Serine (S) and threonine (T) residues described to be O-glycosylated

(Kato and Kaneko, 2014) are double-underlined. Asterisk indicate the stop codon. C.

Diagram of the boundary between exons 3 and 4 of podoplanin showing the possible alternative ends of the coding sequence. For transcriptomic data, the possible finals are

GRYSP-stop (continuous line) or GRP-stop (discontinuous line), while for genomic data usually is GRYS-stop (boxed with continuous line).

Fig. 2. Phylogenetic tree of the podoplanin coding sequence in jawed vertebrates.

Nodes with a filled circle denote support for BI (PP≥0.95) and ML (BP≥70%). Open circles denote support for BI but nor for ML (BP<70%). Absence of circles denotes no support from any of the phylogenetic methods. Details of the species studied are shown in Supplementary Table 1. The original trees generated by RAxML and MrBayes are

Supplementary documents 1 (newick format) and 2 (nexus format), respectively

Fig. 3. Phylogeny of podoplanin for rodents (extracted from Fig. 2) and differences in the sequence of the CT domain. The different sub-orders of rodents are indicated, as well as their sister group Lagomorpha. See text for details.

Fig. 4. Phylogeny of podoplanin for caecilians showing differences of the CT domain. A. Diagram showing the different isoforms found in Gymnophiona. Exons and introns are not drawn to scale. B. Sequence of the different cytoplasmic domains. See text for details.

Fig. 5. Schematic diagram of podoplanin gene structure in different taxa. Exons and introns are not to scale. 25

Fig. 6. Unique exon2 in Protopterus. A, amplified band; B, alignment of experimental and database sequence of exon2. A putative PLAG domain (Acid-Acid-x-x-T) is highlighted with a dashed box. Arrows indicate the oligodeoxyncleotides used for the

PCR reaction. Identical bases are denoted by dots.

Fig. 7. Synteny of the podoplanin region in different species. The figure depicts the relative order and orientation of the KAZN , PRDM2, PDPN and LRRC38 genes in the species shown. 5’ or 3’ on the left indicates that the genes are annotated in specific (H.sapiens, chr1; M. musculus, chr4; G. gallus, chr21; C. picta, chr 24; X. laevis, chr 7L; D. rerio, chr 8; T. rubripes, chr 3). The orientation of chromosomes or contigs is set to show synteny. Absence of the 5’/3’ label indicates that the podoplanin region is not assigned to any chromosome. Genes not present in the diagrams may be located in different scaffolds or in different coordinates of a chromosome, like in C. picta, in which KAZN gene is not contiguous to PRDM2, PDPN and LRRC38.

Supplementary Fig. 1. Completion of the Sphenodon puntatus exon3 sequence.

Upper panel, diagram showing the region of interest of contig ScrUdWx_2583 (not to scale). Exon 2b is approximately 14 kb downstream of the gap (dashed part of the line).

For exon 3, continuous line indicates the known sequence. Arrows with numbers 1 to 4 refer to the primers used (see Materials and Methods, section 2.6). Left panel: gel electrophoresis of the bands obtained in PCR with primers 1+3 in three different S. punctatus isolates. Arrow indicates the 800 bp marker. Right panel shows the sequence of the three PCR products. Identical bases are denoted by dots. No products were obtained using primers 2+4, indicating that the gap is probably >5 kb. The continuous box indicate the GXXXG homodimerization motif. The vertical line after the TC bases in the serine codon denotes the exon/intron boundary.

Supplementary table 1: List of species uses in this study. 26

Supplementary doc1: ML phylogenetic tree in newick format.

Supplementary doc 2: BI phylogenetic tree in nexus format. Table 1

Table 1. General models for positive selection tested on the vertebrate podoplanin dataset.

Model -lnL LRT P-value Proportion ω of sites (dN/dS) One Ratio (M0) 10718.05 0.3512 Free Ratio 10470.27 495.56 0.0006*

Nearly neutral (M1a) 10640.12 0.32 0.2759 0.68 1

Positive selection (M2a) 10633.78 12.68 0.0017* 0.29 0.2735 0.57 1 0.14 1.8615

Beta (M7) 10525.01 0.875 0.3727

Beta + >1 (M8) 10518.47 13.08 0.0014* 0.98 0.3500 0.02 1.44

Table 2

Table 2. Results of the branch-site model test for those species having ω (dN/dS) > 1.5 in the free-ratio model test. For each species, lnL, LRT, ω, and p-values are shown.

Branch  -lnL -lnL LRT ω p-value Selected codonsa,b from model A null free- model A1 ratio model Paralichthys olivaceus 2.27 10632.07 10639.23 14.32 23.30 0.0001* I148** Maylandia zebra 2.84 10636.70 10636.71 0.02 2.77 0.444 Mola mola 1.53 10636.29 10636.49 0.40 6.66 0.264 Pundamilia nyererei 2.92 10636.70 10636.70 0 2.47 0.5

Lepidosiren paradoxa 98.57 10636.70 10640.11 6.82 3.98 0.004* Protopterus annectens 1.27 10634.75 10635.29 1.08 131.09 0.149

Microcaecilia 10.00 10624.13 10628.73 9.20 999 0.001* G22*, 13T**, 20T**, V129*, T130**, dermatophaga G132**, G137**, L139*, G146**, I148*, V152* Caecilia tentaculata 1.54 10638.47 10636.10 -4.74 2.27 0.5 Typhlonectes 4.38 10635.75 10635.97 0.44 151.14 0.253 compressicauda

Xenopus laevis 2.96 10.638.75 10.635.26 -6.98 5.38 0.5

Ophiophagus hannah 2.27 10631.9 10633.54 3.28 25.62 0.035* L11*, I148**

Alligator mississippiensis 1.96 10634.75 10637.5 5.5 1.59 0.009*

Pantholops hodgsonii 3.51 10636.70 10636.71 0.02 2.98 0.444 Myodes glareolus 1.58 10640.12 10640.12 0 2.61 0.5 Dipodomys ordii 2.75 10635.83 10640.26 8.86 10.13 0.001* Choloepus hoffmanni 2.22 10633.39 10633.59 0.4 2.54 0.264 Tarsius syrichta 2.60 10635.48 10640.35 9.74 46.13 0.0009* Myotis brandtii 1.48 10638.51 10639.37 1.72 6.99 0.095 Giraffa camelopardalis 2.50 10640.01 10640.11 0.2 4.75 0.328 aObtained from the Bayes empirical Bayes (BEB)(Yang et al.. 2005) analysis; posterior probability for >95% (*) or >99% (**). bCodon numbers are the equivalents in the human sequence shown in Fig. 1B. See section 4.2 for more details.

Figure 1

A

B SP PLAG1 PLAG2 PLAG3 MWKVSALLFV LGSASLWVLA EGASTGQPED DTETTGLEGG VAMPGAEDDV VTPGTSEDRY KSGLTTLVAT PLAG4 SVNSVTGIRI EDLPTSESTV HAQEQSPSAT ASNVATSHST EKVDGDTQTT VEKDGLSTVT LVGIIVGVLL

AIGFIGAIIV VVMRKMSGRY SP*

C Intron 3

ATGCGAAAAATGTCGGGAAGGTACTCgtaagtaaa...ttcacagGCCCTAA M R K M S G R Y S P * Exon 3 Exon 4 Figure 2

Chondrichthyes

Dipnoi

Actinopterygii Coelacanthimorpha

Amphibia

Sphenodontia

Squamata Testudines

Crocodylia

Aves Monotremata Marsupilalia Afrotheria Xenarthra

Boroeutheria

Scandentia Figure 3

RKISGRFSP KKISGRFSP RKISGRFSP RKISGRFSP RKISGRFSP RKVSGRFSP MYOMORPHA RKVSGRFSP RKVSGRFSP RKISGRFSP RKISGRFSP RKISGRFSP RKISGRFSP RKVSGRFSP CASTORIMORPHA RKVSGRFSP RKISK RNMSK RKMSK RKMSK HYSTRICOMORPHA RKMSN WKMSN RKMGGRYSP RKMGGRYSP SCIUROMORPHA RKMGGRYSP RKMSGRYSP RKMSGRYSP LAGOMORPHA RKMSGRYSP Figure 4 A

exon1 exon2a exon2b exon3 exon4 exon4’ Rhinatrema bivittatum bannanicus Typhlonectes compressicauda

Caecilia tentaculata

Microcaecilia unicolor Microcaecilia dermatophaga

B RKMSGRYTDEDTLELEDMNRSSKEALLRSR RKMSGRYTPY RKMSGRYTPY RKMSGRYTPY RKMSGRYTPY RKMSGTYSPYQRNLTSL RKMSGRYTPY RKMSGRYTPY RKMSGRYTDEEILELEDMHRSFKEALLRSRVSL Figure 5

Amniota Ex 1 Ex 2a Ex 2b Ex 2c Ex 3 Ex 4

Sauropsida Tetrapoda

Ex 1 Ex 2a Ex 2b Ex 3 Ex 4

Amphibia

Ex 1 Ex 2 Ex 3 Ex 4

? Figure 6

A B

A Q N I S G D T N G K Q E P K D T A T P P. dolloi GCGCAAAACATCTCAGGTGATACAAATGGAAAACAGGAACCGAAGGATACTGCAACTCCG #6 ----CC...... A...... A...... T...... #13 ----CC...... A...... #26 ---...... A...... #27 -----...... A......

S P E D V Q T L H T V A S D I M K Y L Q P. dolloi TCTCCTGAGGATGTTCAAACCCTACACACAGTAGCAAGTGATATAATGAAATACCTTCAG #6 ...... G....A...... A...A.G...... #13 ...... C...... #26 ...... C...... #27 ...... C......

T T T Q T A S N D T A E E V T K T T V I P. dolloi ACTACAACACAGACTGCCAGCAATGACACAGCAGAAGAAGTTACAAAGACTACTGTCATA 200 bp #6 ...... G...... A...... T...... G...... - #13 ...... T...... G...... - #26 ...... T...... G...... - #27 ...... T...... G...... -

I E N S D P. dolloi ATTGAAAATTCTGATGG #6 ------#13 ------#26 ------#27 ------Figure 7

H. sapiens 5’ LRRC38 PDPN PRDM2 KAZN

M. musculus 3’ LRRC38 PDPN PRDM2 KAZN

L. africana 5’ LRRC38 PDPN PRDM2 KAZN

D. novemcinctus LRRC38 PDPN PRDM2

G. gallus 3’ LRRC38 PDPN PRDM2 KAZN

S. camelus LRRC38 PDPN PRDM2

A. mississipiensis LRRC38 PDPN PRDM2 KAZN

C. Picta 3’ LRRC38 PDPN PRDM2

P. bivittatus LRRC38 PDPN PRDM2 KAZN

S. punctatus LRRC38 PDPN

X. laevis 3’ LRRC38 PDPN PRDM2 KAZN

L. chalumnae LRRC38 PDPN PRDM2

C. milii PDPN PRDM2

D. rerio 3’ PDPN PRDM2a KAZNa

T. rubripes 3’ PDPN PRDM2 KAZN

P. reticulata PDPN PRDM2 KAZN Supplementary Table 1 Click here to download Supplementary material for on-line publication only: Supplementary_Table_1.xlsx

Supplementary Figure 1 Click here to download Supplementary material for on-line publication only: Suppl_Fig1.pptx

ML tree Click here to download Supplementary material for on-line publication only: Supplementary_Doc1_ML_tree.nwk

Bayesian tree Click here to download Supplementary material for on-line publication only: Supplementary_Doc2_BAYES_tree.nx

Phylogenetic tree data Click here to download Phylogenetic tree data: ML_tree.nwk