Article Universal nomenclature for oxytocin– vasotocin ligand and receptor families

https://doi.org/10.1038/s41586-020-03040-7 Constantina Theofanopoulou1,2,3 ✉, Gregory Gedman1, James A. Cahill1, Cedric Boeckx2,3,4 & Erich D. Jarvis1,5 ✉ Received: 16 October 2019

Accepted: 29 May 2020 Oxytocin (OXT; hereafter OT) and arginine vasopressin or vasotocin (AVP or VT; Published online: 28 April 2021 hereafter VT) are neurotransmitter ligands that function through specifc receptors Open access to control diverse functions1,2. Here we performed genomic analyses on 35 species Check for updates that span all major vertebrate lineages, including newly generated high-contiguity assemblies from the Vertebrate Genomes Project3,4. Our fndings support the claim5 that OT (also known as OXT) and VT (also known as AVP) are adjacent paralogous that have resulted from a local duplication, which we infer was through DNA transposable elements near the origin of vertebrates and in which VT retained more of the parental sequence. We identifed six major oxytocin–vasotocin receptors among vertebrates. We propose that all six of these receptors arose from a single receptor that was shared with the common ancestor of invertebrates, through a combination of whole-genome and large segmental duplications. We propose a universal nomenclature based on evolutionary relationships for the genes that encode these receptors, in which the genes are given the same orthologous names across vertebrates and paralogous names relative to each other. This nomenclature avoids confusion due to diferential naming in the pre-genomic era and incomplete genome assemblies, furthers our understanding of the evolution of these genes, aids in the translation of fndings across species and serves as a model for other families.

OT and VT act as hormones or neurotransmitters that—through their varied biochemical-based and evolutionary-based terminologies have led respective G--coupled receptors—regulate a wide range of bio- to confusion as regards the orthology and paralogy of these genes, which logical functions, including uterine contractions and milk ejection in is emblematic of a wider problem in . placental mammals; copulation, bond formation, thermoregulation, Here we analysed the genomes of 35 species that span all the major nesting behaviour and social vocalizations (for oxytocin) across many vertebrate lineages as well as an additional 4 outgroup genomes from vertebrate and some invertebrate groups; and antidiuresis, blood pres- invertebrate lineages (Supplementary Table 2); these included several sure, parental care and reproduction (for vasotocin) in mammals and/ species that were sequenced with long-read and long-range scaffold- or other vertebrates and invertebrates1,2 (Supplementary Table 1, Sup- ing technologies by the Vertebrate Genomes Project (VGP) (https:// plementary Note 1). In the pre-genomic era, small differences in amino vertebrategenomesproject.org/), which filled gaps and corrected acids of the OT and VT hormones in different species or lineages led bio- errors of previous shorter-read assemblies3. On the basis of gene syn- chemists to give them and their receptors different names: for example, teny, sequence identity, family tree and other analyses, we propose mesotocin in birds, reptiles and frogs, and isotocin in teleost fish, for that OT and VT are paralogous genes that arose through a local dupli- the apparent oxytocin complement of mammals; and vasopressin in cation via DNA transposable elements near the origin of vertebrates. mammals for the apparent vasotocin complement in other vertebrates6. We propose that the OTR-VTR genes evolved by a combination of It has previously been hypothesized that OT and VT are the product of a whole-genome duplication and segmental duplication, which led to local duplication near the origin of vertebrates5. However, the evolution- six receptors near the origin of jawed vertebrates with lineage-specific ary trajectory of the receptors is under debate7–10. One recent view9 is that losses and gains thereafter. With this improved understanding of the the genes that encode the OT and VT receptors (hereafter, OTR-VTRs) relations between the OTR-VTR genes, we propose a universal verte- evolved through two rounds of whole-genome duplication in the ancestor brate nomenclature based on evolutionary relationships (Table 1). of cyclostomes. An alternative view10 posits that the OTR-VTRs evolved by one round of whole-genome duplication shared by agnathans and gnathos- tomes, followed by segmental duplications. However, these studies used Approach highly fragmented genome assemblies and inconsistent annotations, and In all genomes, we initially searched for OT, VT and OTR-VTR genes could not conclusively resolve the evolution of the OTR-VTRs. The resulting using pair-wise BLAST and BLAT analyses, and analysed the synteny

1Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA. 2Section of General Linguistics, University of Barcelona, Barcelona, Spain. 3University of Barcelona Institute for Complex Systems, Barcelona, Spain. 4ICREA, Barcelona, Spain. 5Howard Hughes Medical Institute, Chevy Chase, MD, USA. ✉e-mail: [email protected]; [email protected]

Nature | Vol 592 | 29 April 2021 | 747 Article Table 1 | Previous and proposed terminology for genes encoding OT and VT ligands and receptors in vertebrates

Mammals Birds Turtles and Frogs Fish Sharks Universal vertebrate crocodiles revision Oxytocin (OXT, OT, Oxy) Mesotocin (MT, MST) Mesotocin Mesotocin Mesotocin (MT) Valitocin Oxytocin (OT) Neurophysin (NPI) Oxt-like (MT, MST) (MT, MST) Isotocin (IT, IST) Aspargtocin Mesotocin (MT) Neurophysin-1-like Glumitocin Neurophysin IT-1-like, IT-NP Arginine vasopressin Vasotocin (VT) Vasotocin (VT) Vasotocin (VT) Vasotocin (VT) Vasotocin (VT) Vasotocin (VT) (AVP, ARVP, AVRP, Vp, Vsp) VT-NP, avpl, vsnp Neurophysin II (NP2) Lysine vasopressin Phenypresin OXTR, OTR VT3, MTR OXTR MesoR, OXTR ITR, OXTR, itnpr-like 2, itr2 OXTR Oxytocin receptor (OTR) AVPR1a, V1aR, V1A VT4, VT4R Avpr1, VasR Avpr1aa, VasR, Avpr1ab Vasotocin receptor 1A (VTR1A, V1A) AVPR1b, V1bR, (A) VT2, AVT2R Vasotocin receptor 1B VPR3, V3, VIBR (VTR1B, V1B) VT1, AVPR2 Avpr2.2 V2C, V2bR2, Avpr2.2, V2L V2C, V2bR2 Vasotocin receptor 2A (VTR2A, V2A) V2B, V2BR1, V2RI, OTRI, nft, Vasotocin receptor 2B avpr2 (VTR2B, V2B) AVPR2, V2R, VPV2R Avpr2bb, V2A(2), avpr2a(a) Vasotocin receptor 2C (VTR2C, V2C)

Long (for example, VTR1A) and short (for example, V1A) versions of the gene symbols are given. Aliases include terminology in the NCBI gene database. A complete list of aliases can be found in Supplementary Table 4a–e. of these genes from microchromosomal to macrochromosal scales In three of the four invertebrate species we analysed, we identi- between and within species. We then assessed congruence between fied a single gene that was structurally similar (3–4 exons)—but not synteny, sequence identity and gene family trees. syntenic—to the vertebrate VT (Supplementary Table 5), supporting previous findings15–17. The exception was amphioxus, which had three copies of the VT gene: two on the same scaffold 23 kb apart from each Evolution of the VT and OT ligands other, and the other on another scaffold in a paralogous syntenic ter- On the basis of BLAST searches, sequence identity and manual ritory (Supplementary Table 5, Supplementary Note 2b). Two of these microsynteny analyses within a ten-gene window (microchromo- three genes had previously been noted13,18, which—together with our somal), we found the human VT orthologue (that is, AVP) in all ver- data—indicates several lineage-specific duplications in amphioxus. tebrates analysed (Fig. 1a, Supplementary Fig. 1, Supplementary To test the hypothesis that vertebrate OT could be a tandem dupli- Table 3). Only the putative VT in hagfish did not have genes in syn- cation of VT5, we searched for DNA transposable elements, which are teny with any other vertebrate (presumably owing to the fragmented known to drive gene duplications19. We found transposable elements assembly), but the gene tree of this putative VT formed an immediate around OT (for example, in human and chimpanzee), but not around node with the lamprey VT (Extended Data Fig. 1a)—which suggests VT (Fig. 1b, Supplementary Tables 6, 7). These transposable elements it is the VT homologue. In jawed vertebrates (after the divergence had terminal inverted repeats, which are known to transpose through a of the lamprey and hagfish), we found the OT orthologue directly cut-and-paste mechanism that creates an extra copy at the donor site19. adjacent to VT except in teleost fish (Supplementary Fig. 1, Sup- We searched for other features that are encountered in duplicated plementary Table 4a). In teleosts, OT was translocated nearby on genes, such as intron shortening and/or an increase in GC content20: the same (or to a separate chromosome in zebrafish; both of the human OT introns were shorter than the VT introns, with Supplementary Table 4a), which supports more rearrangements in the first OT intron also being 13% richer in GC content (77.9% versus teleosts11. The spotted gar—which represents the divergence of the 64.6%) (Fig. 1b). These relationships varied among species, with the holosteans, sister to the teleosts—had both OT and VT together in the elephant shark—representing a more basal vertebrate divergence than translocated OT region found in teleosts, indicating that there was that of human—showing a large decrease in length of only the first OT first a translocation and then a relocation of VT in teleosts near its intron compared to VT (3,226 bp versus 1,158 bp) but similar GC content. original location. A previous short-read assembly of the megabat had We also found that the orientation of the genes was tail-to-head OT as the only gene on a scaffold, indicating a fragmented assembly. OT-to-VT (same direction) in nearly all vertebrates (including marsupial The pale spear-nosed bat assembly from the VGP3 and Bat1K project4 mammals)—except for placental and monotreme mammals, in which revealed a local triplication of the OT and VT genes. We found sup- the orientation was tail-to-tail (OT inverted) (Supplementary Table 8). port for this triplication using single Pacbio long reads and Bionano This is indicative of the fact that, after the original OT tandem duplica- optical maps that spanned the entire region (Extended Data Fig. 1b, tion of VT, OT inversions either occurred independently at the origin c, Supplementary Note 2a): such duplications are known to be hard of monotremes and placental mammals (as previously suggested13) to assemble with short reads12. An OT orthologue was not found in or occurred at the origin of mammals with marsupials reverting back lampreys and hagfish, which provides support for a previous result to the tail-to-head orientation. We also identified an independent OT in lamprey13. This previous report was inconclusive owing to the fact inversion in the spotted gar (Supplementary Table 8). The totality of our that the assembly was generated from the sized-down, programmed findings suggest that OT is a local tandemly duplicated copy of VT that and rearranged somatic genome, whereas we analysed a long-read arose after the divergence of jawed vertebrates, which was followed by germline genome of the sea lamprey14; the inshore hagfish data are divergences in introns, GC content, gene orientation, translocations from a short-read germline genome. and further duplications in different lineages.

748 | Nature | Vol 592 | 29 April 2021 a VT OTR VTR1B VTR2B OT VTR1A VTR2A VTR2C

Hagsh

* Lampreys

Sharks

Holostean sh

Gnathostomes Teleost sh

Coelacanths Bony sh Amphibians

Lobe-nned sh Lizards

Turtles Tetrapods Crocodiles

Amniotes Birds

Mammals

b OT VT Exon 1 Exon 2 Exon 3 Exon 3 Exon 2 Exon 1 + Intron 1 Intron 1 – 3,071,000 3,085,000 Intron 2 Intron 2 GC content: 77.9% GC content: 64.6% TE-TIR TE-TIR – +

Fig. 1 | Phylogenetic distribution and local gene duplication. single phylum or two separate phyla35,37. b, Local chromosomal organization of a, Phylogenetic distribution of OT, VT and OTR-VTR genes among vertebrates. the OT and VT region. Representation of the position (in kb), orientation (+ or −) Filled circles, presence of a gene; empty circles, loss of a gene; no circle, the of OT and VT genes (exons + introns) in human , intron length gene never evolved in that lineage. Phylogenetic tree based on ref. 36. (scale, 100 bases), GC content and DNA transposable elements with terminal *Unresolved relationship for whether hagfishes and lampreys constitute a inverted repeats (TE-TIRs) (green).

the ligands, with evolution-based suffixes (Table 1) from evidence high- A universal nomenclature for OT and VT lighted in the ‘Evolution of the VT and OT ligands’ section. On the basis of these findings, we propose a universal nomenclature On the basis of microsynteny analyses, we found the gene that is in which oxytocin (that is, OT) and vasotocin (that is, VT) are used commonly defined as the oxytocin receptor (OXTR) in mammals or for these genes in all jawed vertebrates, and VT is used in all jawless the mesotocin receptor (MTR) in birds (henceforth referred to as OTR) vertebrates and closely related invertebrates. We believe that these in a well-conserved syntenic region in nearly all of the vertebrates we genes should be named in this manner because it portrays their evo- examined (Fig. 2a, Supplementary Fig. 2). We found similar results for lutionary history, as is standard practice for other genes that are the gene known as arginine vasopressin receptor 1 (AVPR1) or arginine orthologous across species (for example, FOXP1) and paralogous vasopressin receptor 1A (AVPR1A) in mammals, or vasotocin recep- within species (for example, FOXP2, FOXP3 and FOXP4). According tor 4 (VT4) in some non-mammals (henceforth referred to as vasotocin to this practice, the genes encoding these two peptides would be receptor 1A (VTR1A)) (Supplementary Fig. 3). By contrast, the gene named vasopressin 1 (AVP1) and vasopressin 2 (AVP2), vasotocin 1 known as AVPR3 or AVPR1B in mammals or as vasotocin receptor 3 (VT1) and vasotocin 2 (VT2) or oxytocin 1 (OT1) and oxytocin 2 (OT2). (VT3) in non-mammals (henceforth referred to as vasotocin recep- As we realize that this would be a far-reaching shift from the existing tor 1B (VTR1B)) was present in all tetrapods, sharks and coelacanths, nomenclature, we propose that the common origin of these genes but was absent in other fish and lampreys (Supplementary Fig. 4). The be portrayed through the shared suffix -tocin, and paralogy con- syntenic territory of VTR1B was present in all fish except lampreys, veyed through different root words oxy- and vaso-. Vasotocin is a which indicates a gain of VTR1B after the divergence from lampreys that name that is already used by most scientific communities focusing was followed by a loss in holosteans and teleosts after their divergence on non-mammalian species (Table 1). Furthermore, the name ‘argi- from coelacanths and sharks (Fig. 1a). In addition, teleosts showed rear- nine vasopressin’ (AVP) entails that this gene encodes an arginine as ranged syntenic gene blocks on each side of OTR, each side of VTR1A the eighth , which is not the case for all mammals6. For and on one side of the (lost) VTR1B (Supplementary Figs. 2–4). non-mammalian species, this means that the peptides currently The gene known as vasotocin receptor 1 (VT1) in birds, and by sev- known as mesotocin, isotocin, glumitocin, valitocin, aspargtocin eral other names in other lineages (Table 1) (henceforth referred to and neurophysin in different lineages would now be called by one as vasotocin receptor 2A (VTR2A)), was found in conserved synteny orthologous name (that is, oxytocin) (Table 1). in reptiles, mammals and some fish, although its syntenic territory was present in all of these lineages (Supplementary Fig. 5)—indicating independent losses (Fig. 1a). The gene known as arginine vasopressin Six vertebrate OTR–VTRs receptor 4 (AVPR4) in fish (henceforth referred to as vasotocin recep- Our manual microsynteny analysis within a ten-gene window revealed tor 2B (VTR2B)) was detected only in fishes and lampreys but its syntenic six paralogous receptors among vertebrates (Fig. 1a, Supplementary territory was detected in all vertebrates (Supplementary Fig. 6), which Tables 3, 4b–e). Most vertebrate species had four or five of the six recep- indicates a loss in the tetrapod ancestor (Fig. 1a). The gene commonly tors, and some had further lineage-specific duplications (Extended known as arginine vasopressin receptor 2 (AVPR2) in mammals or as Data Fig. 9, Supplementary Note 2c–e). For greater clarity, we present AVPR2A in fishes (henceforth referred to as vasotocin receptor2 C our findings using our proposed nomenclature of the root names for (VTR2C)) was found in all vertebrates except for lampreys, elephant

Nature | Vol 592 | 29 April 2021 | 749 Article a Human EDEM1 GRM7 LMCD1 SSUH2 CAV3 OTR RAD18 SRGAP3 THUMPD3 SETD5 LHFPL4 Chicken EDEM1 GRM7 LMCD1 CAV3 OTR RAD18 SRGAP3 THUMPD3 VHL IRAK2 Painted turtle EDEM1 GRM7 LMCD1 SSUH2 CAV3 OTR RAD18 SRGAP3 THUMPD3 VHL IRAK2 Tropical EDEM1 clawed frog GRM7 LMCD1 SSUH2 CAV3 OTR RAD18 SRGAP3 THUMPD3 VHL IRAK2 Zebra sh TEX264A GRM2A PARP3 CAV3 OTRa RAD18 SRGAP3 HEMK1 CISH TWF2A Coelacanth TEX264A GRM2B RRP9 PARP3 CAV3 OTR RAD18 NISCH STAB1 SMIM4 PBRM1 Spotted gar EDEM1 GRM7 LMCD1 SSUH2 CAV3 OTR RAD18 SRGAP3 THUMPD3 VHL KIAA0895L Elephant CAND2 RAD18 TUSC2 RASSF1 ZMYND10 SEMA3H shark SEC13 GRIP2 SSUH2 CAV3 OTR Sea lamprey SEMA3A TFE3 LMCD1 LMO6 SSUH2 OTRa TMEM5 SRGAP3 THUMPD3 GRIP1 VHL b VTR1A-VTR2A OTR-VTR2B VTR1B VTR2C VTR2B Family (Chr. 12–12+7) (Chr. 3–3) Chr. 1 Chr. X Chr. 3 ABCD ABCD2 ABCD1 AMIGO AMIGO2 AMIGO3 ARL/ARF ARF3 ARL8B ARL8A ARF4 ATP ATP2B2 ATP2B4 ATP6AP1 ATP2B3 BCAP BCAP29 BCAP31 CAMK CAMKV CAMK1 CAMK1G CAND CAND1 CAND2 CDHR CDHR3 CDHR4 CNTN CNTN1 CNTN6/4 CNTN2 CNTN3 CSRP CSRP2 CSRP1 DNASE1L DNASE1L1 DNASE1L3 DOCK DOCK4 DOCK11 DOCK3 DUSP DUSP9 DUSP7 DYRK DYRK2 DYRK3 FLN FLNA FLNB IFRD IFRD2 IKBK IKBKE IKBKG IRAK IRAK3/4 IRAK2 IRAK1 KLHDC8 KLHDC8A KLHDC8B L1CAM NRCAM CHL1 NFASC L1CAM LAMB LAMB1/4 LAMB3 LAMB2 LEMD LEMD3 LEMD1 EMD LHFPL LHFPL3 LHFPL4 LRRN LRRN3 LRRN1 LRRN2 LSMEM LSMEM1 LSMEM2 MAPKAPK MAPKAPK2 MAPKAPK3 MON MON2 MON1A PLXNA PLXNA4 PLXNA2 PLXNA3 PLXNA1 PLXNB PLXNB3 PLXNB1 PPM1 PPM1H PPM1M RASSF RASSF3 RASSF5 RASSF1 SRGAP SRGAP1 SRGAP3 SRGAP2 ARHGAP4 TREX TREX2 TREX1

Fig. 2 | Interspecies and intraspecies synteny analyses. a, Example of genes in black bold were found within a 10-Mb window of the deleted VTR2A on interspecies ten-gene microsynteny for OTR across vertebrates. Same colour, chromosome 12 or (in blue bold) 7, or within the 10-Mb window of the deleted orthologous genes. Black boxes, genome rearrangements. OTRa in the sea VTR2B on . Orange column, all genes listed (in black bold) were lamprey and zebrafish is orthologous to OTR in all other vertebrates. Human found within a 10-Mb window of VTR1B on (orange block). OTR is currently known as OXTR; tropical clawed frog otr is currently known as Yellow column, all genes listed (in black bold) were found within a 10-Mb oxtr. b, Intraspecies 10-Mb macrosynteny among 6 (block window of VTR2C on chromosome X (yellow block). Green column, an colours) for all OTR-VTR gene regions in humans whether present (OTR, VTR1A, alternative syntenic territory of VTR2B (green) was also found at a different VTR1B and VTR2C) or deleted (VTR2A and VTR2B). Gene families are listed location of chromosome 3. Genes not in bold are found outside of the strict alphabetically on the left. In the blue column, underlined genes were found 10-Mb window, but are on the same chromosome as the respective OTR-VTR within a 10-Mb window of VTR1A on chromosome 12. In the pink column, gene. underlined genes were found within a 10-Mb window of OTR on chromosome 3;

sharks and birds (Supplementary Fig. 7). In birds, the absent VTR2C was In hagfish, we found only two VTR genes. These genes are located on part of a larger block of about 20 genes that has been deleted21. These two different scaffolds, one containing VTR1 and the other containing findings indicate a gain of VTR2C in vertebrates after the divergence VTR2, and each is equally syntenic for gene families containing the of elephant sharks, followed by a loss in birds (Fig. 1a). Again, teleosts OTR and VTR2B combination and the VTR1A and VTR2A combination showed rearranged syntenies on either side of VTR2B and on one side in other vertebrates (Supplementary Table 4f, g); this indicates an of VTR2A and VTR2C (Supplementary Figs. 5–7). ancestral relationship to both chromosome combinations, possibly In all of the species we assembled to chromosomal resolution, OTR via duplication. Higher-quality germline assemblies for hagfish should and VTR2B were syntenic on the same chromosome and separated reveal whether these two scaffolds are really separate or are part of by 10–30 genes (Extended Data Fig. 2a, Supplementary Table 4b). the same chromosome. Finally, VTR1B and VTR2C were singly found Similarly, VTR1A and VTR2A were also on the same chromosome or on different chromosomes or scaffolds in all species in which they scaffold and separated by 4–50 genes, except in mammals and fish were present (Supplementary Table 4d, e). We verified these findings (Supplementary Table 4c). In mammals, the syntenic genes (includ- with an independent, automated, more-quantitative and longer-range ing VTR1A) on one side of the deleted VTR2A were on chromosome 12 measure by using SynFind22 on alignments in up to 100-gene macrosyn- (human nomenclature), whereas those on the other side were on chro- teny windows around the receptors, in all major lineage combinations mosome 78,9 (Extended Data Fig. 2b, Supplementary Table 4c), which (Extended Data Fig. 3, Supplementary Note 3). indicates a fission that possibly involved the loss of VTR2A in mammals. In fish, there were complex patterns of rearrangements and duplica- tions but some species (for example, the three-spined stickleback, Chromosomal orthology and paralogy of OTR-VTRs gar, coelacanth and elephant shark) still contain VTR1A and VTR2A on To assess whether the interspecies synteny we identified was due the same chromosome (Supplementary Table 4c), which indicates to local segmental synteny within a chromosome or to entire lineage-specific chromosomal fissions and other rearrangements. chromosomal-scale orthology, we generated dot plots using SynMap2

750 | Nature | Vol 592 | 29 April 2021 a Sea lamprey scaffold 27 VTR1A-VTR2A 70 (OTR-VTR2B combination) OTR-VTR2B *P < 0.0001 VTR1B VTR2C 60 No orthologous receptor

*P = 0.0173 Chr. 4 *P = 0.0015

50 Chr. 6 Chr. 11 . 5

40 Chr Chr. 12 *P = 0.0001

30 . 7 Chr. 3 Chr Chr. 8

20 Chr. 4 Chr. 1 Chr. 3 Chr. 23 Chr. 7 Number of sea lamprey syntenic gene hits 10 Chr. 12 Chr. X Chr. 6 . 8 Chr. 2 Chr. 26 Chr

0 Japanese medaka Zebra sh Chicken Frog Human

b 80 Sea lamprey Scaffold 10 (VTR1A-VTR2A combination) *P < 0.0001 70

60 Chr. 3 *P = 0.0001 50

40 Chr. 1 Chr. 8

30 Chr. 4 Chr. 4

20 Chr. 2 Chr. 23 Chr. 7 Chr. 3 Chr. 5 Chr. 6 Chr. 1 Chr. X

Number of sea lamprey syntenic gene hits 10 Chr. 12 Chr. 7 Chr. 12 Chr. 26 Chr. 11

0 Chr. 23 Japanese medaka Zebra sh Chicken Frog Human

Fig. 3 | Analysis with SynMap2 identified syntenic gene hits between sea The minimum number of aligned homologous gene pairs to be considered lamprey scaffolds containing two receptors each and chromosomes of syntenic was 3 at a 20-gene maximum distance in each species. For other species. Bar graphs were created from dot plots. a, Sea lamprey scaffold comparisons including human, the minimum number was set to 2. *Significant 27 is most syntenic with chromosomes of other species that contain the differences between chromosomes with the highest number of gene hits OTR-VTR2B combination. b, Sea lamprey scaffold 10 is most syntenic with within a species (P < 0.05; χ2 test, two-sided; n = 199 genes located on chromosomes of other species that contain the VTR1A-VTR2A combination. scaffold 27; n = 246 genes located on scaffold 10). for entire chromosomes or scaffolds that contained OTR-VTR genes, containing the VTR1A-VTR2A and OTR-VTR2B combinations may be focusing on comparisons between species that represent major ver- paralogues from a whole-genome duplication. The third highest num- tebrate lineages and which were sequenced at chromosomal resolu- ber of syntenic gene hits were to chromosomes that contained VTR1B tion. By examining basal branches, we found the sea lamprey scaffold or VTR2C (in no particular order) (Fig. 3a, b), which suggests possible that contained the combination of OTR and VTR2B had the highest paralogous segmental duplications. Similar—but not as strong—results number of syntenic hits (30–60 genes) to the chromosome in all other were found for an apparent duplicate sea lamprey scaffold that con- vertebrates that had the combination of OTR and VTR2B (Fig. 3a). We tained one VTR gene (Extended Data Fig. 4a, b, Supplementary Note 1c). found a similar result between species in chromosomes containing the When we used the two scaffolds that contained the VTR1 and VTR2 combination of VTR1A and VTR2A (Fig. 3b). Exceptions included some hagfish genes as references, we found fewer syntenic gene hits to chro- fish, in which another chromosome had a similar number of syntenic mosomes of other species: chromosomes with the OTR-VTR2B and hits (Fig. 3a, b)—consistent with an extra chromosome paralogue from VTR1A-VTR2A combinations showed the highest number of hits, with no an additional whole-genome duplication. Mammals were also an excep- clear preference between them (Extended Data Fig. 4c–f, Supplemen- tion: here, the orthology of the sea lamprey scaffold containing the tary Note 4). These findings further support a deep ancestral paralogy VTR1A-VTR2A combination was split between two chromosomes (for between chromosomes that contain these two receptor combinations, example, human chromosomes 7 and 12) (Fig. 3b), consistent with a with possible ancestral chromosome representatives in hagfish. Our fission event. These results indicate that these two gene combinations findings are consistent with previous studies identifying chromosomal are syntenic, because each belongs to a chromosome orthologue of paralogues14,23–26, and further reveal newly identified candidate chro- vertebrates after the split with lampreys. The second highest gene hits in mosomal paralogues in species with genomes that—to our knowledge— most species were to the chromosome that contained the other recep- have not yet been compared (sea lamprey versus medaka, frog versus tor combination (Fig. 3a, b), which indicates that the chromosomes medaka and so on) (Fig. 3).

Nature | Vol 592 | 29 April 2021 | 751 Article

100 VTR1A (chicken) a 63 VTR1A (lizard) 100 VTR1A (human) 100 VTR1A (mouse) 50 VTR1A (frog) 41 VTR1A (coelacanth) 49 VTR1A (elephant shark) 53 85 VTR1A (spotted gar) 94 VTR1Aa (zebra sh) VTR1Ab (zebra sh) VTR1B (elephant shark) 100 100 VTR1B (frog) 60 VTR1B (coelacanth) 100 VTR1B (mouse) 75 VTR1B (human) 38 85 VTR1B (chicken) VTR1B (turtle) 100 OTR (chicken) 85 OTR ( nch) 87 OTR (alligator) OTR (lizard) 70 100 OTR (spotted gar) 80 30 69 OTRa (zebra sh) OTRb (zebra sh) 100 69 46 OTR (coelacanth) OTR (elephant shark) VTR1 100 OTR (frog) 100 OTR (human) OTR (mouse) 97 OTRb (sea lamprey) OTRa (sea lamprey) 72 VTR1A (sea lamprey) VTR1A (hag sh) 94 VTR2B (spotted gar) 86 VTR2Bb (zebra sh) VTR 100 100 VTR2B (coelacanth) 100 VTR2B (elephant shark) 100 VTR2A (chicken) 81 100 VTR2A ( nch) 96 VTR2A (spotted gar) VTR2A (hag sh) 74 VTR2A (sea lamprey) VTR2B (sea lamprey) 88 VTR2C (lizard) 93 100 VTR2C (turtle) 100 VTR2C (human) VTR2 15 VTR2C (mouse) 100 VTR2Cb (zebra sh) 100 100 VTR2Ca (zebra sh) VTR2C (spotted gar) 53 VTR2C (frog) VTR2C (coelacanth) 100 VTR (amphioxus) 100 VTR (amphioxus) VTR (amphioxus)

0.70 b

Tetrapods: VTR1A (75 homologues)

Coelacanth: VTR1A Ray- nned shes: VTR1A (20 homologues)

VTR1 Tetrapods: VTR1B (80 homologues)

Coelacanth: VTR1B Lamprey: ENSPMAG00000007650 (VTR1A) Hag sh: ENSEBUG00000001467 (VTR1) Lamprey: ENSPMAG00000001242 (OTRa) ENSPMAG00000009764 (OTRb) Tetrapods: OTR (76 homologues)

Coelacanth: OTR Ray- nned shes: OTR (20 homologues) VTR Amniotes: VTR2C (68 homologues)

Coelacanth: VTR2C Frog: VTR2C Ray- nned shes: VTR2C (22 homologues) VTR2 Ray- nned shes: VTR2B (14 homologues) Coelacanth: VTR2B Teleost shes: VTR2A (4 homologues) Lamprey: ENSPMAG00000002106 (VTR2B) Lamprey: ENSPMAG00000003512 (VTR2A) Tetrapods: VTR2A (7 homologues) Spotted gar: VTR2A Hag sh: ENSEBUG00000007964 (VTR2) Ciona: VTR

Fig. 4 | OTR-VTR gene family trees. a, Tree topology inferred with the TreeFam method on an amino acid alignment generated via the Ensembl ‘gene phylogenetic maximum likelihood method on an exon nucleotide alignment tree’ tool (gene tree identifier: ENSGT00760000119156). Left, red boxes (MAFFT), with 1,000 non-parametric bootstrap replicates. Bootstrap values denote inferred gene duplication node; blue boxes denote inferred speciation are shown as percentages at the branch points (values <50% were considered events; and turquoise boxes denote ambiguous nodes. Right, green bars less informative). The tree is rooted with the three VTR genes we found in denote multiple amino acid alignment made with MUSCLE; white areas denote amphioxus. The gene names of the current accessions (see Table 1 and gaps in the alignment; and dark green bars denote consensus alignments. Gene Supplementary Tables 4a–e for full list of synonyms) were written over names are revised according to our synteny-based orthology; Extended Data according to our revised synteny-based orthology. Scale bar, phylogenetic Fig. 8 shows a tree with the current nomenclature in Ensembl. distance of 0.78 substitutions. b, Tree topology inferred with the phylogenetic

and 7, versus chromosome 3). We also found an extra gene-family Rapid radiation of the VTR1 and VTR2 families territory on human chromosome 3 that is syntenic with the VTR2B We next assessed paralogues within species to help to determine territory (Fig. 2b). However, we did not detect any VTR1 or VTR2 genes evolutionary relationships among the receptors. We analysed 10-Mb within a species that shared substantially more synteny than others macrosynteny windows between and within chromosomes of the (Supplementary Table 9a, b). Further, no gene family was present in same species (intraspecies) for all 6 receptors (whether present or the territory of all six present or deleted receptors. At a more micro- deleted). Within species (for example, human), we found paralogous scale level, among the exons and introns of OTR-VTR genes (Extended ‘gene families’ in syntenic blocks around all OTR-VTR genes (Fig. 2b, Data Fig. 5) and in flanking microRNAs and long non-coding RNAs Supplementary Table 9a), which supports the notion that at least (Extended Data Figs. 6, 7, Supplementary Table 10), we also did not parts of these chromosomes are paralogous (human chromosomes 12 find any one gene with more similarity to another that would allow

752 | Nature | Vol 592 | 29 April 2021 abInvertebrate common ancestor Invertebrate common ancestor

VTR VTR

Vertebrate ancestor: shared with hagsh Vertebrate ancestor: shared with hagsh VTR1 VTR2 VTR1 VTR2 SD SD

Vertebrate ancestor: shared with lamprey Vertebrate ancestor: shared with lamprey VTR1A VTR2A VTR1A VTR2A One round One round OTR VTR2B of WGD OTR VTR2B of WGD

Jawed vertebrate ancestor: shared with shark Jawed vertebrate ancestor VTR1A VTR2A X Mammals and teleost VTR1A VTR2A X Mammals and teleosts

OTR VTR2B X Tetrapods OTR VTR2B X Tetrapods SD Two rounds SD of WGD X Holostean VTR1B VTR2C VTR1B VTR2D and teleost sh X All vertebrates X Holostean X Birds VTR1C VTR2C X Birds and teleost sh X All vertebrates

Fig. 5 | Two proposed hypotheses for the evolution of OTR-VTR genes. segmental duplication was followed by two rounds of whole-genome a, Hypothesis 1 proposes the receptors evolved by an initial segmental duplication and specific losses (X), including in all vertebrates (blue X). Lines duplication (SD), then a one round of whole-genome duplication (WGD), connecting genes indicate that they are on the same chromosome in most followed by two segmental duplications in different vertebrate lineages and species. Alignments between sets of genes indicate the closest related then by losses (red X) in specific lineages. b, Hypothesis 2 proposes the initial paralogues.

us to make further conclusions about the evolution of gene subfami- Note 5). By contrast, our phylogenetic sequence analyses revealed lies (for example, sea lamprey and human) (Supplementary Notes 5, tree topologies with almost 1:1 consistency to our synteny-defined 6). Most sequences were too similarly divergent to inform paralogy relationships (Fig. 4). (Supplementary Note 5). Overall, the lack of further macrosynteny The combined phylogeny and synteny analyses supported a single and simple sequence-divergence resolution of paralogues within a VTR gene shared with an invertebrate ancestor (that is, represented species—combined with the better resolution on chromosome ortho- in sea squirt). This receptor then duplicated into what we designate logues and paralogues between species in which these gene regions the ancestral VTR1 and VTR2 on the same chromosome (that is, rep- reside—suggest a rapid radiation of receptor evolution near the origin resented in hagfish) (Fig. 4). Thereafter, the trees suggest that these of vertebrates. two genes expanded into three genes in the VTR1 subfamily (VTR1A, OTR and VTR1B) and three in the VTR2 subfamily (VTR2A, VTR2B and VTR2C), respectively, with VTR1A and VTR2A on the same chromo- Single chromosome origin of OTR-VTRs some and the paralogous OTR and VTR2B on another chromosome. To assess receptor origins, we performed ancestral analyses by Thereafter, the sister relationship of VTR1A and VTR1B in both trees mapping regions that contain the OTR-VTR genes against recon- suggests that one directly gave rise to the other, after the divergence structed chromosomes of the vertebrate or chordate ancestor from of jawless fish (based on absence of VTR1B in lampreys). Likewise, four independent studies14,23,25,27. Chromosome fragments contain- the sister relationship of VTR2B and VTR2A in the nucleotide tree is ing the OTR-VTR genes in vertebrates all mapped back to the same consistent with the synteny findings of one giving rise to the other, reconstructed chromosome (Supplementary Table 11). A previous and—together—their sister relationship to VTR2C is consistent with study28 suggested that VTR2C maps back to a separate ancestral one of them giving rise to it, after the divergence of sharks (based on chromosome: we believe that this is inaccurate because the region absence of VTR2C in sharks). that contains the gene we name VTR2C is entirely missing from the In stark contradiction to the synteny findings, the lamprey reconstruction that was used23, although it is present in other recon- VTR1A and OTR genes each clustered outside of their respective structions14,25,27 that were based on higher-quality amphioxus and synteny-defined VTR1 homologues among species and the same sea lamprey genome assemblies (Supplementary Table 11). These occurred for lamprey VTR2A and VTR2B for the VTR2 homologues findings are consistent with the fact that that vast majority of inver- (Fig. 4), which implies lamprey-specific duplications. One possible tebrates that have been examined have only one VTR gene13,17 (Sup- explanation for these contradictions is that there could be conver- plementary Table 5). gence within the lamprey OTR-VTR genes (possibly owing to higher GC content29) or that the divergence was so rapid at the origin of verte- brates that the true relationship is not easy to resolve using gene tree Synteny, phylogeny and receptor evolution inference. Consistent with the latter, the bootstrap support values are We generated phylogenetic trees of the receptor gene family across low (72–74%) for a more recent gene duplication. Consistent with the vertebrates using alignments of both the exon nucleotide (RAXML) and former, the lamprey exon sequences of all 4 receptors were among amino acid (TreeFam and TreeBeST5) sequences, and then mapped our the highest in GC content (60–69%) compared to other species (Sup- revised synteny-based naming onto the tree leaves. BLAST nucleotide plementary Table 13). The three amphioxus VTR sequences in our comparisons alone, and previous nomenclatures based on these analy- exonic tree cluster within species at 100% support (Fig. 4a), consistent ses (Table 1), yielded many contradictions with the synteny-defined with lineage-specific duplications (Supplementary Note 2b). There orthologues (Supplementary Table 12). We believe this is due to BLAST were some differences in local relationships in the exon and amino not returning alignments of the entire sequence, which in turn is due to acid trees (Fig. 4a, b), but these did not affect the major conclusions larger lineage divergences between those gene regions (Supplementary here (Supplementary Note 7).

Nature | Vol 592 | 29 April 2021 | 753 Article of only two receptor genes (VTR1 and VTR2) in inshore hagfish is more A universal nomenclature for OTR-VTR genes consistent with the paraphyly (separate phyla) of lampreys and hag- On the basis of the above findings, we propose a universal nomenclature fishes than the monophyly hypothesis35, as it does not require further for the OTR-VTR genes in which their root terms follow the ligand names inference of independent gene losses or gains or incomplete lineage (oxytocin receptor (OTR) and vasotocin receptor (VTR)) and their enu- sorting (Fig. 1a). Thus, our findings may have repercussions on a wider meration terms (1A, 1B, 2A, 2B and 2C) follow their evolutionary history: and highly debated topic—that of the evolution of vertebrate genomes the numbers 1 and 2 designate the original duplication, and the letters (Supplementary Note 9). A, B and C designate the subsequent subfamily duplications. The only Our revised understanding of the receptor relationships allows a exceptions we made were the decisions not to rename OTR as VTR1B or more holistic view of their functions. We generated a multiprotein cod- VTR1B as VTR1C (as the evolutionary history warrants), because we felt ing sequence alignment among the highest-quality assemblies of all this might be too radical of a departure from the common use. This is six receptors, and found that the seven transmembrane domains and further justified in that—although there is crosstalk in OT and VT bind- associated polar amino acids that interact with OT or VT have remained ing to these receptors—OT is the dominant ligand for OTR30. For the highly conserved in sequence or amino acid type, even 550 million VTR2 genes, we reordered the enumerations according to the inferred years after their common origin (Extended Data Fig. 10, Supplementary chronological order of duplications: VTR2A and VTR2B for the genes we Note 10). By contrast, the extracellular OT or VT binding domains and found in lampreys, and VTR2C for the gene that evolved in the ances- the intracellular G-protein binding domain became highly diversified tor of bony fishes. This universal nomenclature gives a single name from one receptor to another, predicting greater diversity in initial to each gene across vertebrates. The gene that is commonly known ligand binding and subsequent intracellular signalling. Nine amino as arginine vasopressin receptor 1A (AVRP1A) in mammals, vasotocin acid sites distinguish the VTR1 and VTR2 subfamilies (most in or near receptor 4 (VT4) in birds and vasotocin receptor (VasR) in frogs would, the transmembrane regions), but only one of these is in the G-protein in our revised nomenclature, be called vasotocin receptor 1A (VTR1A) binding region (Extended Data Fig. 10). All of the receptors use dia- (Table 1). The gene that is commonly known as oxytocin receptor (OXTR) cylglycerol, inositol triphosphate and Ca2+ for second-messenger in mammals, vasotocin receptor 3 (VT3) or mesotocin receptor (MTR) intracellular signalling—except for VTR2C, which uses cAMP (Sup- in birds and frogs, and isotocin receptor (ITR) in fish would be called plementary Table 31). The tissues in which the receptors have their oxytocin receptor (OTR). Similar changes from multiple names to a highest expression include the brain (except VTR2A), with expression single name apply to the other four receptors (Table 1). being highest in the adrenal gland. We could not find data available for signalling or expression for VTR2B in fish, but predict it will be similar to members of the VTR2A and VTR2C subfamilies. Finally, our Interpretations and evolutionary hypotheses findings that the OTR (represented by lamprey divergence) evolved We considered a model of OTR-VTR evolution in the context of two millions of years before the OT ligand (represented by elephant shark competing hypotheses of vertebrate genome evolution: one round divergence) suggest that the ancestral VT may have originally acted of whole-genome duplication and segmental duplications (Fig. 5a) through the OTR before OT evolved. This suggestion is supported by versus two rounds of whole-genome duplication (Fig. 5b). For both findings that, in some species, OT and VT bind to the OTR at similar hypotheses, we propose that the single VTR in the vertebrate ances- efficiencies8; a greater response of OTR to OT over VT is found for the tor had a tandem segmental duplication on the same chromosome first time in teleost fish8. at over 550 million years ago31 that gave rise to the ancestral VTR1 and In summary, we believe that our revised evolution-based and uni- VTR2 genes. Thereafter, in a one round of whole-genome duplication versal nomenclature will make translating findings across vertebrates in a gnathostome ancestor, one copy of each of the VTR1 and VTR2 much easier. It will help to inform our understanding of crosstalk genes gave rise to the VTR1A-VTR2A combination on one chromosome between some of the ligands and receptors, our understanding of paralogue and the OTR-VTR2B combination on the other chromosome genome evolution and could serve as a model for a broader universal paralogue. From here, the two hypotheses diverge. In hypothesis 1 gene nomenclature. (Fig. 5a), a segmental translocated duplication of the chromosomal region containing VTR1A gives rise to VTR1B in the ancestor of jawed vertebrates at over 500 million years ago and a segmental translocated Online content duplication of the region containing VTR2B gives rise to VTR2C in the Any methods, additional references, Nature Research reporting sum- common ancestor of other vertebrates with bony fish at over 450 mil- maries, source data, extended data, supplementary information, lion years ago. Segmental duplications have been found in other gene acknowledgements, peer review information; details of author contri- families at these evolutionary time points32,33. In hypothesis 2 (Fig. 5b), butions and competing interests; and statements of data and code avail- two rounds of whole-genome duplication before the divergence of ability are available at https://doi.org/10.1038/s41586-020-03040-7. gnathostomes from cyclostomes lead to four more receptors in the ancestor of jawed vertebrates at over 500 million years ago. Both 1. Knobloch, H. S. & Grinevich, V. Evolution of oxytocin pathways in the brain of vertebrates. hypotheses agree with lineage-specific losses of VTR2B in the ances- Front. Behav. Neurosci. 8, 31 (2014). tor of tetrapods, of VTR2A independently in mammals and teleost fish, 2. Meyer-Lindenberg, A., Domes, G., Kirsch, P. & Heinrichs, M. Oxytocin and vasopressin in of VTR1B in holostean and teleost fish, and of VTR2C in birds. However, the human brain: social neuropeptides for translational medicine. Nat. Rev. Neurosci. 12, 524–538 (2011). for hypothesis 2 to be true, complete independent losses of thus-far 3. Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate unidentified VTR1C and VTR2D genes and associated chromosome species. Nature https://doi.org/10.1038/s41586-021-03451-0 (2021). segments in an extinct species before its divergence into all other verte- 4. Jebb, D. et al. Six reference-quality bat genomes reveal evolution of bat adaptations. Nature 583, 578–584 (2020). brate lineages would be required. Our results are more parsimoniously 5. Hoyle, C. H. Neuropeptide families and their receptors: evolutionary perspectives. Brain explained by hypothesis 1 (one-round of whole-genome duplication)14 Res. 848, 1–25 (1999). with prior and subsequent segmental duplications (Supplementary 6. Acher, R. & Chauvet, J. Structure, processing and evolution of the neurohypophysial hormone–neurophysin precursors. Biochimie 70, 1197–1207 (1988). Note 8). Such a vertebrate evolutionary scenario is consistent with 7. Ocampo Daza, D., Lewicka, M. & Larhammar, D. The oxytocin/vasopressin receptor family expectations given a simple random mutational model34 that requires has at least five members in the gnathostome lineage, inclucing two distinct V2 subtypes. as few as 6 mutational steps, whereas models that invoke two rounds of Gen. Comp. Endocrinol. 175, 135–143 (2012). 8. Yamaguchi, Y. et al. The fifth neurohypophysial hormone receptor is structurally related whole-genome duplication require at least 9 steps in our case (Supple- to the V2-type receptor but functionally similar to V1-type receptors. Gen. Comp. mentary Note 8) or 12–18 under previous assumptions34. Our findings Endocrinol. 178, 519–528 (2012).

754 | Nature | Vol 592 | 29 April 2021 9. Lagman, D. et al. The vertebrate ancestral repertoire of visual opsins, alpha 26. Uno, Y. et al. Inference of the protokaryotypes of amniotes and tetrapods and the subunits and oxytocin/vasopressin receptors was established by duplication of their evolutionary processes of microchromosomes from comparative gene mapping. PLoS shared genomic region in the two rounds of early vertebrate genome duplications. BMC ONE 7, e53027 (2012). Evol. Biol. 13, 238 (2013). 27. Putnam, N. H. et al. The amphioxus genome and the evolution of the chordate karyotype. 10. Mayasich, S. A. & Clarke, B. L. The emergence of the vasopressin and oxytocin Nature 453, 1064–1071 (2008). hormone receptor gene family lineage: clues from the characterization of vasotocin 28. Yun, S. et al. Prevertebrate local gene duplication facilitated expansion of the receptors in the sea lamprey (Petromyzon marinus). Gen. Comp. Endocrinol. 226, neuropeptide GPCR superfamily. Mol. Biol. Evol. 32, 2803–2817 (2015). 88–101 (2016). 29. Zhang, H. et al. Lampreys, the jawless vertebrates, contain only two ParaHox gene 11. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the clusters. Proc. Natl Acad. Sci. USA 114, 9146–9151 (2017). early vertebrate proto-karyotype. Nature 431, 946–957 (2004). 30. Song, Z. & Albers, H. E. Cross-talk among oxytocin and arginine-vasopressin receptors: 12. Hubley, R. et al. The Dfam database of repetitive DNA families. Nucleic Acids Res. 44, relevance for basic and clinical studies of the brain and periphery. Front. D81–D89 (2016). Neuroendocrinol. 51, 14–24 (2018). 13. Gwee, P. C., Tay, B. H., Brenner, S. & Venkatesh, B. Characterization of the 31. Kumar, S. & Hedges, S. B. A molecular timescale for vertebrate evolution. Nature 392, neurohypophysial hormone gene loci in elephant shark and the Japanese lamprey: origin 917–920 (1998). of the vertebrate neurohypophysial hormone genes. BMC Evol. Biol. 9, 47 (2009). 32. Buechi, H. B. & Bridgham, J. T. Evolution of specificity in cartilaginous fish glycoprotein 14. Smith, J. J. et al. The sea lamprey germline genome provides insights into programmed hormones and receptors. Gen. Comp. Endocrinol. 246, 309–320 (2017). genome rearrangement and vertebrate evolution. Nat. Genet. 50, 270–277 (2018). 33. Venkatesh, B. et al. Elephant shark genome provides unique insights into gnathostome 15. Kawada, T., Sekiguchi, T., Itoh, Y., Ogasawara, M. & Satake, H. Characterization of a novel evolution. Nature 505, 174–179 (2014). vasopressin/oxytocin superfamily peptide and its receptor from an ascidian, Ciona 34. Smith, J. J. & Keinath, M. C. The sea lamprey meiotic map improves resolution of ancient intestinalis. Peptides 29, 1672–1678 (2008). vertebrate genome duplications. Genome Res. 25, 1081–1090 (2015). 16. Garrison, J. L. et al. Oxytocin/vasopressin-related peptides have an ancient role in 35. Miyashita, T. et al. Hagfish from the Cretaceous Tethys Sea and a reconciliation of the reproductive behavior. Science 338, 540–543 (2012). morphological–molecular conflict in early vertebrate phylogeny. Proc. Natl Acad. Sci. 17. Liutkeviciute, Z., Koehbach, J., Eder, T., Gil-Mansilla, E. & Gruber, C. W. Global map USA 116, 2146–2151 (2019). of oxytocin/vasopressin-like neuropeptide signalling in insects. Sci. Rep. 6, 39177 36. Green, R. E. et al. Three crocodilian genomes reveal ancestral patterns of evolution (2016). among archosaurs. Science, 346, 1254449 (2014). 18. Roch, G. J., Tello, J. A. & Sherwood, N. M. At the transition from invertebrates to 37. Heimberg, A. M., Cowper-Sal Lari, R., Sémon, M., Donoghue, P. C. J. & Peterson, K. J. vertebrates, a novel GnRH-like peptide emerges in amphioxus. Mol. Biol. Evo. 31, 765–778 microRNAs reveal the interrelationships of hagfish, lampreys, and gnathostomes and the (2013). nature of the ancestral vertebrate. Proc. Natl Acad. Sci. USA 107, 19379–19383 (2010). 19. Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in 20. Rayko, E., Jabbari, K. & Bernardi, G. The evolution of introns in human duplicated genes. published maps and institutional affiliations. Gene 365, 41–47 (2006). 21. Lovell, P. V. et al. Conserved syntenic clusters of protein coding genes are missing in Open Access This article is licensed under a Creative Commons Attribution birds. Genome Biol. 15, 565 (2014). 4.0 International License, which permits use, sharing, adaptation, distribution 22. Lyons, E. & Freeling, M. How to usefully compare homologous plant genes and and reproduction in any medium or format, as long as you give appropriate chromosomes as DNA sequences. Plant J. 53, 661–673 (2008). credit to the original author(s) and the source, provide a link to the Creative Commons 23. Nakatani, Y., Takeda, H., Kohara, Y. & Morishita, S. Reconstruction of the vertebrate license, and indicate if changes were made. The images or other third party material in this ancestral genome reveals dynamic genome reorganization in early vertebrates. Genome article are included in the article’s Creative Commons license, unless indicated otherwise in a Res. 17, 1254–1265 (2007). credit line to the material. If material is not included in the article’s Creative Commons license 24. Nakatani, Y. & McLysaght, A. Genomes as documents of evolutionary history: a and your intended use is not permitted by statutory regulation or exceeds the permitted use, probabilistic macrosynteny model for the reconstruction of ancestral genomes. you will need to obtain permission directly from the copyright holder. To view a copy of this Bioinformatics 33, i369–i378 (2017). license, visit http://creativecommons.org/licenses/by/4.0/. 25. Smith, J. J. et al. Sequencing of the sea lamprey (Petromyzon marinus) genome provides insights into vertebrate evolution. Nat. Genet. 45, 415–421, e1–e2 (2013). © The Author(s) 2021

Nature | Vol 592 | 29 April 2021 | 755 Article Methods Brafl1.home.html with the latest version of the amphioxus genome (Branchiostoma floridae v.2.0), whereas previously reported data13 No statistical methods were used to predetermine sample size. The are based on the first version of the genome (B. floridae v.1.0). For the experiments were not randomized, and investigators were not blinded inshore hagfish genome assembly, the contigs were relatively short to allocation during experiments and outcome assessment. and not fully annotated, and thus we first BLAT-searched all the OT, VT and OTR-VTR sequences of all the aforementioned species against Overall synteny and BLASTn analyses the hagfish genome in Ensembl, found two putative OTR-VTRs in two To define orthology in the OT, VT and OTR-VTRs in all vertebrates, we separate contigs in the hagfish assembly, and then used the ‘Region used interspecies synteny analyses at three scales: a manual 10-gene comparison’ tool of Ensembl to map each gene of these contigs against window microsyteny analyses using BLAT and BLAST38,39 searches the human, zebrafish and lamprey genomes (Supplementary Table 4f, and cross-species genome alignments; a more automated 100-gene g). BLAST gave many gene hits in the hagfish genome, but only with macrosynteny window using SynFind and GeVo22; and automated short segments aligning to OT and VT orthologues in other species. chromosomal-scale alignments with syntenic dot plots using Syn- Thus, to determine whether they were real OT or VT orthologues, we Map240. To define paralogy and further trace the evolutionary his- used the ‘Gene Tree’ tool of Ensembl that constructs a phylogeny using tory of the genes, we used intraspecies synteny analysis, searching the entire protein sequence, with the sea lamprey VT as reference. For for paralogous genes in 10-Mb windows. Microsynteny was useful for the receptors, we used our data from the SynMap2 dot plots (described determining orthologous and paralogous relationships between genes in ‘Chromosome-scale macrosynteny between species’) and included in the majority of the vertebrate lineages. Macrosynteny was useful in the synteny of the hagfish receptors the gene hits that appear on the for determining orthologous and paralogous relationships in cases in chromosomes in which the OTR-VTRs are located in human, chicken, which the microsynteny was weaker, such as between genes found in zebrafish and sea lamprey (Supplementary Table 4f, g). lampreys or hagfish with the rest of the vertebrates. Sequence identity was determined using BLASTn to understand relationships between the Macrosynteny between species in approximately 100-gene per cent identity and synteny (Supplementary Table 12). Only results windows with a bit score >40 and hits with high probability E value <10−4 were We generated gene sequence alignments between pairs of species using kept. We describe the specific methods for each synteny approach in SynFind22 (density, LastZ defaults). This results in a matrix contain- ‘Microsynteny between species in approximately 10-gene windows’, ing syntenic gene-hit values in the reference species relative to query ‘Macrosynteny between species in approximately 100-gene windows’ species along with their chromosomal locations. This data matrix was and ‘Chromosome-scale macrosynteny between species’. parsed and visualized using a custom R script (https://github.com/ ggedman/OT_VT_synteny). First, a 100-gene window centred around Microsynteny between species in approximately 10-gene windows a given receptor gene in the reference organism (x axis) was defined We microsynteny analysis by manually scanning annotated align- using biomaRt (v.3.10). As we move 5′ (left) or 3′ (right) from zero (the ments for 5 protein-coding genes before and after each focus gene focus gene) and if synteny exists, the number of gene hits for a given (Supplementary Table 4a–e) in 35 species spanning all major vertebrate receptor in the query species shows a cumulative increase. This allowed lineages (Supplementary Table 2). The candidate genes in each species us to identify large stretches of homologous sequences interspersed (accession number or gene identifier in Supplementary Table 4a–e) by divergent sequences. were first selected by BLAT and BLAST searches using the UCSC genome browser and alignment (http://genome.ucsc.edu/)38 and the SynFind Chromosome-scale macrosynteny between species tool from the CoGe comparative genomics research platform22. The We used SynMap240 to generate syntenic dot plots of chromosome NCBI and Ensembl41 (v.95) database genome alignments were used to sequence alignments between species that contain OTR-VTRs (Sup- identify the neighbouring genes. For the neighbouring genes, we kept plementary Tables 15–30). SynMap2 identifies collinear sets of genes or in our Supplementary Tables the family gene names used in the genome regions of sequence similarity to infer synteny between two sequences, annotation of each species, even though in some cases—we believe and generates a dot plot of the results. We used the default parameters erroneously—different family names have been given to the ortholo- (as of December 2018), except for ‘Minimum number of aligned pairs’. gous gene in different species (for example, FABP1A in spotted gar and This parameter defines the minimum number of homologous genes FABP1B.1 in stickleback; Supplementary Table 4a). For the species that (based on last default parameters) that should be found in a 20-gene had more lineage-specific duplications, we labelled the gene that shared distance for these genes to be considered syntenic and to appear on more synteny with the orthologue in other vertebrate lineages with ‘a’ the dot plot. In more closely related lineages, we selected three as a (for example, OTRa), and labelled the copy with ‘b’ (for example, OTRb). minimum number (for example, between sea lamprey on the one hand, We listed the aliases in NCBI and Ensembl for each focus gene in each and Japanese medaka, or zebrafish, frog and chicken genomes on the organism (‘Aliases’ column in Supplementary Table 4a–e) and included other); for more distantly related species, we used two (for example, the most frequent ones in Table 1. When our target genes appeared to between hagfish on the one hand, and sea lamprey, or Japanese medaka, be lost in a species (no initial BLAST hit), we searched the surround- zebrafish, frog, chicken and human genomes on the other). Addition- ing gene territory to determine whether only the gene of interest or a ally, because the hagfish contigs were shorter than most other assem- larger block of genes were deleted, or whether the deletion was due to blies (making synteny more difficult to identify), we also ran a dot plot an incomplete genome assembly or assembly artefact. with 1 as the minimum number to search for all possible homologous For some species with more-fragmented genome assemblies or hits, regardless of synteny. annotations or greater divergences in NCBI and Ensembl, we ana- To test for significant differences, we ran a χ2 test with distinct lysed other higher-quality assemblies and annotations. This included samples of genes on the difference of the proportions between the the VGP zebra finch, Anna’s hummingbird, pale spear-nosed bat and first two chromosomes with the highest number of gene hits, using platypus genome assemblies3. For the Japanese lamprey, we included the number of genes in the super-scaffold of the reference species previously published synteny data10. For the sea lamprey, we used the (for example, sea lamprey) as sample size: Supplementary Table 30 assembly of the germline genome14 and analysed it with BLAST, Genome provides confidence intervals, degrees of freedom and P values. For Browser and Gene Search tools (https://genomes.stowers.org/organ- cases that reached significance, to confirm that the number of hits ism/Petromyzon/marinus). For amphioxus, we used the BLAST and between two species was independent of the number of protein-coding Gene Browser tools available at https://genome.jgi.doe.gov/Brafl1/ genes located on the chromosome of the query organism, we applied a gene density-normalization test to rule out the possibility that the separately to understand the divergence of the paralogous genes, fol- chromosomes with most gene hits were owing to them containing the lowing a previously proposed paradigm43, using the maximum score most genes: we did not find such correlations with our macrosynteny and per cent identities of the comparisons that were above the thresh- analyses. old (maximum score >40 and E value < 10−4). We performed a similar analysis for VTR1B and VTR2A in elephant shark and coelacanth, to Macrosynteny within species in approximately 10-Mb windows test whether sequence identity can help to solve ancestry questions. We primarily used the , as it is the best assembled To shed light on the orthology between the inshore hagfish and the genome and therefore subject to generating fewer errors. We listed all sea lamprey OTR-VTRs, we compared their exons and introns as well. genes found in a 10-Mb window from the present OTR-VTRs (for exam- To analyse conserved non-coding RNA synteny around the OTR-VTRs, ple, OTR, VTR1A, VTR1B and VTR2C in mammals) as well as absent ones we looked for them in alignments in all the species studied in Ensembl, (for example, VTR2A and VTR2B, which are absent in mammals). We in the miRbase (http://www.mirbase.org/; release 22), and the chose a 10-Mb window because this genomic region size often captured miRviewer44 database (28 February 2012 update). We aligned macrosynteny of >40 genes, allowing within- and between-species (BLASTn) long non-coding RNA regions within species (sea lamprey macrosynteny analyses described in ‘Macrosynteny between spe- and human). cies in approximately 100-gene windows’ to be comparable. We then searched each gene in the HUGO Gene Nomenclature Committee Data- Gene tree phylogeny analyses base (https://www.genenames.org/) to classify its gene-family. For the Exonic nucleotide tree. Exonic sequences from all the OTR-VTRs from deleted genes, we defined their territories by manually identifying in representative species that had the most-complete assembled genes the human genome the genes around spotted gar VTR2B and chicken were aligned with MAFFT under the E-INS-i parameter set, which is VTR2A; some of these syntenies around the deleted OTR-VTRs had optimized for sequences with multiple conserved domains and long previously been identified8,9, which we confirmed. gaps. Any incomplete non-lamprey OTR-VTR of less than 1,000 bp was excluded, as alignments on short sequences often lack power to resolve Evolutionary history analyses of OT and VT species relationships, resulting in weakly supported gene trees. Because We noted annotated DNA transposable elements in the UCSC Genome of the basal phylogenetic position of the lamprey, all lamprey OTR-VTRs Browser in close vicinity of the OT and VT genes (except for the elephant (754 bp and longer) were included. From this alignment, we generated shark genome, which was not annotated for DNA transposable ele- a phylogenetic maximum likelihood tree using GTRGAMMA model of ments), and thus we quantitatively searched for adjacent transposable RAXML (version 8.2.10)45, with 1,000 replicates. We calculated the GC elements in the human and chimpanzee genomes using RepeatMasker content of all the exonic sequences using http://www.endmemo.com/ (http://genome.ucsc.edu/)38 and obtained information for each spe- bio/gc.php (Supplementary Table 13). cific transposable element using Dfam 2.012. We calculated GC con- tent using ENDMEMO (http://www.endmemo.com/bio/gc.php/). We Protein amino acid tree. A maximum likelihood phylogenetic tree was aligned the introns of human OT and VT in all possible combinations constructed on one representative amino acid sequence for every gene using DIALIGN42 and compared intron lengths using Serial Cloner v.2.6 in every species, using TreeFAM and TreeBeST5 pipeline in the gene (http://serialbasics.free.fr/Serial_Cloner.html). For relative OT and VT tree tool package of Ensembl (https://www.ensembl.org/info/genome/ orientations, we examined whether they were in the same direction compara/homology_method.html). Thereafter, we manually curated (tail-to-head) or in opposite directions (tail-to-tail) in the annotated the Ensembl tree (gene tree identifier: ENSGT00760000119156) using positions in each species. In the cases in which OT and VT were found the universal nomenclature that we propose here. All the sequences in opposite directions, we determined which gene was inverted accord- used to generate both trees, sequence alignments and Newick files can ing to the orientation of other genes in the territory. In addition to be found at https://github.com/constantinatheo/otvt. the genomes used for all other analyses of this study (Supplementary Table 2), we also used the koala (Phascolarctos cinereus) (phaCin_unsw_ Reporting summary v4.1; GCF_002099425.1) and the grey short-tailed opossum (Monodel- Further information on research design is available in the Nature phis domestica) (MonDom5; GCF_000002295.2) genomes to include Research Reporting Summary linked to this paper. orientation data from the marsupial clade in Supplementary Table 8.

Evolutionary history analyses of OTR-VTRs Data availability To assess in which ancestral vertebrate chromosomes the OTR-VTRs All the data used in this study can be found in Supplementary Tables 2–30, originated, we used four ancestral chromosome models from the lit- and at https://github.com/constantinatheo/otvt. Any other relevant erature14,23,25,27, in which the reconstructed chromosomes were based on data are available from the corresponding authors upon request. different species and different genome qualities. Specifically, human, mouse, dog, chicken and tetraodon genomes were used in ref. 23; human, chicken, stickleback, pufferfish, sea squirt, amphioxus, sea urchin, Code availability fruitfly and sea anemone genomes in ref. 27; human, chicken and sea The code used in this study can be found at https://github.com/con- lamprey (somatic) genomes in ref. 25; and chicken, spotted gar and sea stantinatheo/otvt and https://github.com/ggedman/OT_VT_synteny. lamprey (germline) genomes in ref. 14. We searched for the presence of annotated OTR-VTRs in four outgroup invertebrate lineages (through 38. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002). literature review, BLAST and BLAT searches)—namely in sea squirt, 39. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). roundworm, California sea hare and amphioxus. For the amphioxus 40. Haug-Baltzell, A., Stephens, S. A., Davey, S., Scheidegger, C. E. & Lyons, E. SynMap2 and genome (B. floridae v.2.0), we performed BLAT queries on OTR-VTR SynMap3D: web-based whole-genome synteny browsers. Bioinformatics 33, 2197–2198 FASTA sequences from all species studied using the JGI genome browser (2017). 41. Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018). (https://genome.jgi.doe.gov/portal/). 42. Morgenstern, B., Prohaska, S. J., Pöhler, D. & Stadler, P. F. Multiple sequence alignment To test which sea lamprey receptor(s) most probably represents the with user-defined anchor points. Algorithms Mol. Biol. 1, 6 (2006). orthologous ancestral gene(s), we compared the sea lamprey OTR-VTRs 43. Xu, G., Guo, C., Shan, H. & Kong, H. Divergence of duplicate genes in exon–intron structure. Proc. Natl Acad. Sci. USA 109, 1187–1192 (2012). in all possible combinations to each other using BLASTn (same param- 44. Kiezun, A. et al. miRviewer: a multispecies microRNA homologous viewer. BMC Res. eters). We compared the exons and introns of the identified genes Notes 5, 92 (2012). Article

Catalunya (Government of Catalonia) – 2017-SGR-341; and E.D.J. from the Howard Hughes 45. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of Medical Institute and the Rockefeller University. large phylogenies. Bioinformatics 30, 1312–1313 (2014). 46. Yachdav, G. et al. MSAViewer: interactive JavaScript visualization of multiple sequence Author contributions C.T. conceived the idea, developed the evolutionary proposal, alignments. Bioinformatics 32, 3501–3503 (2016). performed BLAST, BLAT, synteny, ancestral, phylogenetic (protein), long non-coding RNA and 47. Gimpl, G. & Fahrenholz, F. The oxytocin receptor system: structure, function, and transposable element analyses, reviewed the literature, made all Figures and Tables and wrote regulation. Physiol. Rev. 81, 629–683 (2001). the first draft. G.G. performed BLAST, BLAT and synteny analyses. J.A.C. performed phylogenetic exon tree analyses. C.B. cosupervised the study. E.D.J. conceived the idea, cosupervised the study, proposed the revised nomenclature and helped to write the Acknowledgements We thank the leadership of the VGP for early access to high-quality manuscript. All authors contributed to editing the manuscript. assemblies: in particular, S. Vernes for the pale spear-nosed bat of the Bat1K; G. Zhang for the platypus; and the B10K group for the Anna’s hummingbird and zebra finch assemblies. We Competing interests The authors declare no competing interests. thank many colleagues—especially D. Larhammar, Y. Nakatani and J. Smith—for useful discussions on the revised gene nomenclature and findings of this study, and also for their Additional information feedback on how to assess genome duplications. C.T. was supported by funds from the Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020- Rockefeller University and the Generalitat de Catalunya; G.G. from the Rockefeller University 03040-7. and an NSF Graduate Research Fellowships Program (GRFP); J.A.C. from the Rockefeller Correspondence and requests for materials should be addressed to C.T. or E.D.J. University; C.B. from the Spanish Ministry of Economy and Competitiveness/FEDER (grant Peer review information Nature thanks Jeramiah Smith and the other, anonymous, reviewer(s) FFI2016-78034-C2-1-P), MEXT/JSPS Grant-in-Aid for Scientific Research on Innovative Areas for their contribution to the peer review of this work. 4903 (Evolinguistics: JP17H06379; principal investigator K. Okanoya) and Generalitat de Reprints and permissions information is available at http://www.nature.com/reprints. Extended Data Fig. 1 | Lineage specific OT-VT specializations. a, Protein these genomes can be found at http://www.ensembl.org/Multi/GeneTree/Imag phylogenetic tree for VT in hagfish and lamprey relative to other vertebrates. e?collapse=none;db=core;gt=ENSGT00390000004511. b, Triplication of the Maximum likelihood amino acid phylogenetic tree generated via the Ensembl pale spear-nosed bat OT-VT region. An approximately10-gene window of ‘Gene tree’ tool (gene tree identifier: ENSGT00390000004511) that uses the synteny between human, megabat and pale spear-nosed bat is shown. In Gene Orthology/Paralogy prediction method pipeline. The longest available megabat, OT, VT and their syntenic genes are found in three different scaffolds protein of each species was used. The tree is reconciled with a species tree, (three boxes). In the pale spear-nosed bat with a higher-quality assembly, generated by TreeBeST. Left, red boxes, inferred gene duplication node; blue a syntenic triplication of the OT-VT region is found. c, gEVAL alignment boxes, inferred speciation events; turquoise boxes, ambiguous nodes. Right, analyses (https://geval.sanger.ac.uk/index.html). This panel shows gapless green bars, multiple amino acid alignment made with MUSCLE; white areas, Pacbio-based long-read contigs (dark blue) and gapless Bionano optical maps gaps in the alignment; dark green bars, consensus alignments. We curated the (yellow), which span through the entire region with the OT and VT duplications tree and renamed genes using the universal nomenclature proposed in this in the pale spear-nosed bat, without any noticeable assembly errors. Article. The tree with the current nomenclature used in the annotations of Article

Extended Data Fig. 2 | Lost VTR receptors in human, representative of chromosome 3 in the human genome. b, Genomic territory of the deleted mammals. a, Genomic territory of the deleted VTR2B in the human genome. VTR2A in the human genome. The genomic territory before the chicken VTR2A The genomic territory before the spotted gar VTR2B (top) was found in human (top) was found in human (100–115 Mb) (bottom). The genomic chromosome 3 (49–51 Mb), 40 Mb after the location of human OTR (bottom). territory after the chicken VTR2A was found in human chromosome 12 (40–43 The genomic territory before the spotted gar VTR2B was also found in human Mb). The solid back region links two regions from chromosomes 7 and 12 in the chromosome 3 (3–5 Mb), 5 Mb before the location of human OTR. Text colours human genome. denote orthologous genes. Solid black region links two different regions on Extended Data Fig. 3 | Macrosynteny SynFind analyses. a–d, Comparisons chimpanzee OTR region (red line) shows 17 syntenic gene matches within 20 between closely related species (human and chimpanzee) for four receptors, genes 5′ (left) of human OTR, and 18 matches within 20 genes 3′ (right) of human showing maximum syntenies found using this method. e–h, Comparisons OTR. If the reference OTR-VTR does not show any match, then it is 0 on the y axis between intermediately related species (human and chicken) for the same four (for example, the chimpanzee VTR1B (shown in green in a)); if the reference receptors in a–d. i–l, Comparisons between distantly related species (human OTR-VTR matches only the query OTR-VTR, it reaches 1 (for example, and fish). m–p, Comparisons between distantly related non-human species. chimpanzee VTR1A (shown in in blue in d) was orthologous only to human On the x axis, 0 represents the query OTR-VTR in the query organism and the VTR2C). If the reference OTR-VTR is not orthologous to the query OTR-VTR, but numbers represent the number of genes on the 5′ (left) and 3′ (right) of the does show gene matches in the neighbouring territory, then it indicates a query OTR-VTR in the genome. The y axis shows the cumulative number of the deletion of the receptor in the query species (for example, chicken VTR1A matched homologous (orthologous or paralogous) syntenic genes in the (shown in blue in f)). reference genome for each reference receptor. For example, in a the Article

Extended Data Fig. 4 | Additional chromosomal SynMap2 analyses with other species containing a VTR2A or VTR2B sequence, with a 1-gene minimum lamprey and hagfish. Bar graphs were created from dot plots. a, Additional per 20-gene window criterion. d, Same synteny analyses as in c, but with a scaffold in sea lamprey (scaffold 49) with a VTR gene is most syntenic with 2-gene minimum per 20-gene window criterion. e, The inshore hagfish scaffold chromosomes of other species containing a VTR1-VTR2 combination, with a FYBX02010841.1, in which the putative VTR1 is located, is most syntenic with 3-gene minimum per 20-gene window criterion. b, SynMap2 dot plot between chromosomes of other species containing a VTR1A or OTR sequence, with a sea lamprey scaffold 49 and scaffolds 10 and 27, with a 3-gene minimum per 10- 1-gene minimum per 20-gene window criterion. f, Same synteny analyses as in gene window criterion. c, The inshore hagfish scaffold FYBX02010521.1, in e, but with a 2-gene minimum per 20-gene window criterion. which the putative VTR2 is located, is most syntenic with chromosomes of Extended Data Fig. 5 | Interspecies BLASTn comparisons between exons b, Two-way comparisons of sea lamprey exons and introns of VTR2B with and introns of all sea lamprey OTR-VTRs in multiple combinations. VTR2A, and OTRa with VTR1A. Maximum scores and per cent identities are a, Three-way comparisons of sea lamprey VTR1A, VTR2A and VTR2B exons shown for the alignments that exceed a threshold (maximum score > 40 and (boxes) and introns (lines), and two-way comparisons of OTRa and VTR2B. E value < 10−4). Sequence length is shown in bp. Article

Extended Data Fig. 6 | Interspecies non-coding RNA paralogous synteny (maximum score > 40 and E value < 10−4) in the BLASTn comparisons. Maximum analyses. a, Long non-coding RNAs around the OTR and VTR1 genes within score (bit score) and per cent identity are shown for each pair of long human. b, Long non-coding RNAs around the OTR-VTRs in sea lamprey. Lines non-coding RNAs. Genomic location is in Mb. connect the long non-coding RNAs that shared identity beyond a threshold Extended Data Fig. 7 | Intraspecies BLASTn comparisons between exons lamprey VTR2A and OTRa. Maximum scores and per cent identities are shown and introns of OTR-VTRs. a, Two-way comparisons of exons (boxes) and for the alignments that yielded results beyond a threshold (maximum score > 40 introns (lines) of elephant shark VTR1B with sea lamprey VTR1A and OTRa. and E value <10−4). Sequence length is shown in bp. b, Two-way comparisons of exons and introns of coelacanth VTR2C with sea Article

Extended Data Fig. 8 | Protein phylogenetic tree for OTR-VTRs with the large vertebrate groups, such as tetrapods (for example, VT1 to VT4 in birds, currently used gene nomenclature. The same amino acid tree as in Fig. 4b, AVPR3 in mammals and AVPR4 in fish), are not shown. but labelled with the nomenclature used to date. Further variations within Extended Data Fig. 9 | Microsynteny for VTR2A across vertebrates and PMZ_0042163-RA on scaffold 10 (Supplementary Table 14). Orthologous genes VTR2A and OTRb within sea lamprey. An approximately 14-gene window are filled with the same colour; genes found in the territory of the sea lamprey around the VTR2A orthologue across species is shown. In the sea lamprey, OTRb are further outlined in black lines. Further discussion is provided in is our revised nomenclature for PMZ_0045207-RA/PMZ_0032217-RA on Supplementary Note 2. scaffold 49 (Supplementary Table 14), and VTR2A is our revision for Article

Extended Data Fig. 10 | MAFFT alignment of the OTR-VTRs of the binding domains is based on findings with OTR47. Amino acids marked with an best-quality assemblies available (human for OTR, VTR1A, VTR1B and asterisk are the OT polar-interacting sites to the receptor; amino acids marked VTR2C; zebra finch for VTR2A; and clingfish for VTR2B). The MAFFT with a # are differences between the VTR1 and VTR2 subfamilies. Colour coding alignment using the FFT-NS-I parameter was visualized with the MSA viewer46. of the amino acids is according to Clustal X (blue, hydrophobic; red, positive The identifiers and protein sequences used, along with the alignment file can charge; green, polar; pink, cysteines; orange, glycines; yellow, prolines; cyan, be found in https://github.com/constantinatheo/otvt. The functional aromatic; http://www.jalview.org/help/html/colourSchemes/clustal.html). annotation of transmembrane domains (TM) and intracellular loops (IT) and      

()00123)4564789@A)0B2CD j80T62adqXkYA1)V84)3)9U)9(X    E82@9358@15FG89@A)0B2CD  lmlmhlhnn  

H13)0@647I9PP80G    

Q8@901H12180RAS62A12@)6P30)T1@A10130)59R6F6U6@G)V@A1S)0W@A8@S139FU62AXYA62V)0P30)T65122@09R@901V)0R)4262@14RG845@084238014RG   

64013)0@647X`)0V90@A1064V)0P8@6)4)4Q8@901H12180RA3)U6R612a211b9@A)02cH1V10112845@A1d56@)068Ue)U6RG(A1RWU62@X     

I@8@62@6R2   `)08UU2@8@62@6R8U848UG212aR)4V60P@A8@@A1V)UU)S6476@1P2801301214@64@A1V67901U17145a@8FU1U17145aP864@1f@a)0g1@A)5221R@6)4X

4h8 ()4V60P15

o YA11f8R@28P3U126i1BpCV)018RA1f3106P14@8U70)93hR)456@6)4a76T14828562R01@149PF10845946@)VP182901P14@

o b2@8@1P14@)4SA1@A10P182901P14@2S101@8W14V0)P562@64R@28P3U12)0SA1@A10@A128P128P3U1S82P182901501318@15UG YA12@8@62@6R8U@12@B2C9215bQqSA1@A10@A1G801)41r)0@S)r26515 o sptuvwxyyxpv€‚€‚v‚ƒx„t v†v ‚w‡ˆ† v‚xttuv†uvp‰yv ‚w‡ˆ†vyx‡vwxy‘t’v€wƒpˆ“„‚vˆpv€ƒv”€ƒx ‚v‚w€ˆxp•

o b512R063@6)4)V8UUR)T8068@12@12@15

o b512R063@6)4)V84G8229P3@6)42)0R)001R@6)42a29RA82@12@2)V4)0P8U6@G84585–92@P14@V)0P9U@63U1R)P38062)42 bV9UU512R063@6)4)V@A12@8@62@6R8U3808P1@10264RU95647R14@08U@14514RGB1X7XP1842C)0)@A10F826R12@6P8@12B1X7X01701226)4R)1VV6R614@C o bQqT8068@6)4B1X7X2@84580551T68@6)4C)0822)R68@1512@6P8@12)V94R10@864@GB1X7XR)4V6514R164@10T8U2C

`)049UUAG3)@A1262@12@647a@A1@12@2@8@62@6RB1X7X—a€a‡CS6@AR)4V6514R164@10T8U2a1VV1R@26i12a5170112)VV0115)P845˜T8U914)@15 o ™ˆdv˜vd‰t„‚v‰‚v’‰w€vd‰t„‚veƒpd‡v‚„ˆ€‰†t•

o `)0f8G12684848UG262a64V)0P8@6)4)4@A1RA)6R1)V306)02845g80W)TRA864g)4@1(80U)21@@6472

o `)0A61080RA6R8U845R)P3U1f5126742a6514@6V6R8@6)4)V@A18330)3068@1U1T1UV)0@12@2845V9UU013)0@647)V)9@R)P12

o d2@6P8@12)V1VV1R@26i12B1X7X()A14g2 ae1802)4g2‡Ca6456R8@647A)S@A1GS101R8UR9U8@15

s„‡ve†vwxttw€ˆxpvxpv‚€‰€ˆ‚€ˆw‚vhx‡v†ˆxtxiˆ‚€‚vwxp€‰ˆp‚v‰‡€ˆwt‚vxpvy‰puvxhv€ƒv‘xˆp€‚v‰†xd• I)V@S801845R)51 e)U6RG64V)0P8@6)48F)9@8T86U8F6U6@G)VR)P39@10R)51 q8@8R)UU1R@6)4 p19215qrT10@1F08@1845s64T10@1F08@1231R612g714)P12aSA)21tq2845u14f84W8221PFUG8RR1226)449PF102R84F1V)94564 I933U1P14@80GY8FU1nXbUU@A1Q(fthd421PFUhu141tq2)V@A171412S12@95615R84F1V)94564I933U1P14@80GY8FU12qBIq8rIq1C845sX bUU@A1714121v914R129215V)0@A13AGU)7141@6R@0112R84F1V)945A101DA@@32Dhh76@A9FXR)PhR)42@84@648@A1)h)@T@X

q8@8848UG262 p19215@A1fEbYB8T86U8FU164@A1w(I(714)P1F0)S2108458U674P14@kU82@9358@1DQ)TxalmnyChfEbIYBTlXxXmzCaIG4g83lau1{)B4) T1026)48T86U8FU1kU82@9358@1DI13@n|almnxC845IG4`645B4)T1026)48T86U8FU1kU82@9358@1I13@n|almnxC@))U2V)0@A12G4@14G848UG212X `)0@A1218U8P301GB7eP80nXmXxCaS19215@A1fEbIYau14)P1f0)S210845u141I180RA@))U28T86U8FU164A@@32Dhh714)P12X2@)S102X)07h )078462Phe1@0)PGi)4hP806492X`)0@A18P3A6)f92BfXVU)06581TlXmCaS19215@A1fEbIY845u141f0)S210@))U28T86U8FU164A@@32Dhh 714)P1X–76X5)1X7)Thf08VUnhf08VUnXA)P1XA@PUXIG4`6450129U@2S101V90@A1038021592647F6)P8H@BTqXnmC845T6298U6i15926478R92@)P 84H2R063@BTqX|XnCBA@@32Dhh76@A9FXR)Ph7715P84h}Y~{Y~2G4@14GCX

p1v984@6@8@6T1UG2180RA15V)0qQb@08423)28FU11U1P14@2BYd2C80)945@A1}Y845{Y0176)464@A1A9P84845RA6P384i11714)P12 92647@A1H1318@g82W10@))UBU82@9358@1Dg80RAlmalmnrC64@A1w(I(u14)P1f0)S210BA@@3Dhh714)P1X9R2RX159hC845S1)F@86415 64V)0P8@6)4V)018RA231R6V6RYdT68qV8PlXmXp1R8UR9U8@15@A1u(R)4@14@92647dQqgdg}B4)T1026)4)0U82@9358@158@18T86U8FU1C BA@@3DhhSSSX145P1P)XR)PhF6)h7RX3A3hCXp18U67415@A164@0)42)VA9P84}Y845{Y648UU3)226FU1R)PF648@6)4292647qtbEtuQ 

BTlXlXnC845R)P38015@A1U147@A)V@A164@0)42S6@A@A1A67A106514@6@GBV602@64@0)4)V}YT2XV602@64@0)4)V{YC92647@A1I1068U(U)410  

TXlX|BA@@3Dhh21068UF826R2XV011XV0hI1068U~(U)410XA@PUCX ! " # $ % &

Y)848UGi1R)4210T154)4rR)5647HQb2G4@14G80)945@A1}YHr{YH2aS1U))W15V)0@A1P648U674P14@2648UU@A1231R6122@9561564 ' d421PFUBTxrCa64@A1P6HF821BA@@3DhhSSSXP60F821X)07hkP6HF821ll01U1821Ca845@A1P6HT61S1058@8F821BU82@9358@1D`1FlyalmnlCX

YA11f)46R714121v914R12S1018U67415S6@Agb``YBTC94510@A1drtQIr63808P1@10X`0)P@A628U674P14@a8eAGU)7141@6Rg8f6P9P E6W1U6A))5@011S82714108@1592647Hb€gEg2BTyXlXnmCuYHubggbP)51US6@Anmmm013U6R8@12X

YA130)@164rR)5647g8f6P9PE6W1U6A))53AGU)7141@6R@011S82R)42@09R@15S6@A@A1ƒu141@011„@))U64d421PFUBTxrCBu141Y011tqD

dQIuYmm|mmmmnnxnr|CD7141@0112S101R)42@09R@1592647)4101301214@8@6T18P64)8R6521v914R1V)01T10G7141641T10G231R612  92647Y011`bgBTxC845Y011f1IYBTnXxXlC3631U6416484d421PFUBTxrC38RW871X     

bUU01U1T84@01V1014R12845U64W28018T86U8FU164@A1gg1@A)52g21R@6)4X   

`)0P8492R063@29@6U6i647R92@)P8U7)06@AP2)02)V@S801@A8@801R14@08U@)@A1012180RAF9@4)@G1@512R06F156439FU62A15U6@108@901a2)V@S801P92@F1P8518T86U8FU1@)156@)02h01T61S102X  

p12@0)47UG14R)90871R)51513)26@6)4648R)PP946@G013)26@)0GB1X7Xu6@‚9FCXI11@A1Q8@901H12180RA79651U6412V)029FP6@@647R)51c2)V@S801V)0V90@A1064V)0P8@6)4X 



q8@8    e)U6RG64V)0P8@6)48F)9@8T86U8F6U6@G)V58@8   

bUUP8492R063@2P92@64RU951858@88T86U8F6U6@G2@8@1P14@XYA622@8@1P14@2A)9U530)T651@A1V)UU)S64764V)0P8@6)4aSA101833U6R8FU1D  

rbRR1226)4R)512a946v916514@6V6102a)0S1FU64W2V)039FU6RUG8T86U8FU158@821@2 

rbU62@)VV679012@A8@A8T1822)R68@1508S58@8  rb512R063@6)4)V84G012@06R@6)42)458@88T86U8F6U6@G 

bUU@A158@8845R)512921564@A622@95GR84F1V)94564@A1I933UXY8FU12~YA1)V84)3)9U)91fR1U5)R9P14@84564@A1V)UU)S647513)26@)0612DA@@32Dhh76@A9FXR)Ph  R)42@84@648@A1)h)@T@kA@@32Dhh76@A9FXR)Ph7715P84h}Y~{Y~2G4@14GX 

`61U5r231R6V6R013)0@647 eU182121U1R@@A1)41F1U)S@A8@62@A1F12@V6@V)0G)90012180RAXtVG)98014)@2901a0185@A18330)3068@121R@6)42F1V)01P8W647G)9021U1R@6)4X o E6V12R614R12 f1A8T6)908Uc2)R68U2R614R12 dR)U)76R8Ua1T)U9@6)480Gc14T60)4P14@8U2R614R12 `)0801V1014R1R)3G)V@A15)R9P14@S6@A8UU21R@6)42a21148@901XR)Ph5)R9P14@2h40r013)0@647r29PP80GrVU8@X35V

E6V12R614R122@95G512674 bUU2@95612P92@562RU)21)4@A1213)64@21T14SA14@A1562RU)2901624178@6T1X I8P3U126i1 `)0)90P6R0)2G4@14G848UG212S19215qrT10@1F08@1845s64T10@1F08@1231R612g714)P12XYA12164RU951541SUG01r21v914R15231R612B38U1 23180r4)215F8@a3U8@G392ab448g2A9PP647F605ai1F08V64RAaFU94@r24)9@15RU647V62ACS6@AU)47r0185Be8RF6)C845U)47r084712R8VV)U5647 Bf6)484))3@6R8UP832a‚6r(845nm€U64W01852C@1RA4)U)7612714108@15FG@A1{10@1F08@1u14)P12e0)–1R@B{uekA@@32Dhh T10@1F08@1714)P1230)–1R@X)07CXp121U1R@15@A1231R612S1921564)0510@)01301214@8UUP8–)0T10@1F08@1U6418712XYA101S824)@828P3U1r 26i1R8UR9U8@6)4@A8@U15@)@A151R626)4)V@A128P3U126i1XI8P3U126i125)4)@833UG64@A121R821264@A1S8G@A1G833UG@))@A10 1f3106P14@2aSA101aV)01f8P3U1aP84G6456T6598U2801@12@1564@A128P121@93@)013U6R8@1@A10129U@Xt4@A1R821)V8714)P6R2g2@95GU6W1 )902a@A62S)9U5F13)226FU1)4UG6V@A101S10156VV1014@A67ArU1T1U8221PFU612)V@A128P1231R612aF9@2@6UU6@S)9U5F1T10G0801B1f@01P1UG R)2@UGC@)018RA82@8@62@6R8UUG29VV6R614@8P)94@)V6456T6598U2X

`)0)90P8R0)2G4@14GRA0)P)2)P8Ur2R8U1848UG212S19215714)P12@A8@A8T1F11421v914R158@8RA0)P)2)P1rU1T1UB–8384121P158W8a i1F08V62AaRA6RW14aV0)7aA9P84C@)R)P3801S6@A@A1293102R8VV)U5rU1T1U8221PFUG)V218U8P301G845@A12R8VV)U5rU1T1U8221PFUG)V642A)01 A87V62AXp1RA)21@A121231R612g714)P1264)0510@)01301214@RA0)P)2)P1rU1T1U8221PFU612V0)P82P84GT10@1F08@1U6418712823)226FU1 BS101301214@15@1U1)2@V62AaF6052a8P3A6F6842845P8PP8U2CSA14R)P38015@)@A1218U8P301GBU641871DU8P301G2C845@A1642A)01A87V62A BU641871DA87V62A12CXt4@A1012@)V@A1U6418712@A8@S1014)@01301214@15BA)U)2@184V62Aa2A80W2aR)1U8R84@A2a013@6U12C@A1018014)@84G RA0)P)2)P1rU1T1U8221PFU6128T86U8FU1G1@X`)0@A13903)21)V@A12@95G6@S82R06@6R8U@)64RU951RA0)P)2)P1rU1T1U8221PFU612Bv98U6@GCa 642@185)V82P84G8221PFU612823)226FU1Bv984@6@GCa@A8@S)9U54)@210T1@)012)UT1@A11T)U9@6)480Gv912@6)4X

`)0@A130)@164rR)56473AGU)714GS192158UU@A1231R612g714)P1264RU951564@A1d421PFUBTxrC58@8F821X `)0@A11f)46R3AGU)714GS19215@A1U)4712@0185r21v914R128T86U8FU1V0)P231R61201301214@6478UUP8–)0T10@1F08@1U6418712BA9P84845 P)921V)0P8PP8U2aRA6RW14V)0F6052a@90@U1845U6i805V)0013@6U12aV0)7V)08P3A6F6842aR)1U8R84@AV)0R)1U8R84@A2ai1F08V62AV)0@1U1)2@V62Aa 23)@@15780V)0A)U)2@184V62Aa1U13A84@2A80WV)02A80W2a218U8P301GV)0U8P301G2a642A)01A87V62AV)0A87V62A12CXYA128P3U126i16229VV6R614@ 64@10P2)Vv984@6@GB264R18UUP8–)0T10@1F08@1U641871280101301214@15C845v98U6@GBS19215)4UGU)47r018521v914R12CX

q8@81fRU926)42 p15654)@1fRU95184G714)P12)V231R612@A8@S)9U5A8T1R)4@06F9@15V90@A10@)@A1945102@845647)V@A11T)U9@6)4)V@A1}Yr{YU678452 845@A1}YHr{YH01R13@)02X 

t4)901f)46R3AGU)7141@6R@011a84G4)4rU8P301G}YHr{YH21v914R12U122@A84nmmmF3B6X1X64R)P3U1@1CS1011fRU9515a828U674P14@2)4  

2A)0@21v914R12P8GU8RW3)S10@)012)UT1231R612g01U8@6)42A632a0129U@64764S18WUG2933)0@157141@0112Xf1R8921)V@A1U8P301Gg2F828U ! "

3AGU)7141@6R3)26@6)4a8UUU8P301G}YHr{YH2BrsF3845U)4710CS10164RU9515X # $ % & H13U6R8@6)4 p1013U6R8@15)90P6R0)2G4@14GV6456472)4@A1562@06F9@6)4)V@A1}YHr{YH01R13@)0264T10@1F08@12a92647P8R0)2G4@14GB93@)nmmr7141 ' S645)SaRA0)P)2)P1r2R8U1Ca3AGU)714GB1f)46R84530)@164R)5647@0112C84584R12@08UBP833647)900176)42)V64@1012@F8RW@)39@8@6T1 84R12@08UT10@1F08@1)0RA)058@1RA0)P)2)P12C848UG212X

H845)P6i8@6)4 H845)P6i8@6)4S824)@01U1T84@64@A622@95GXp19215648UU@A1848UG212@A1714)P12S6@A@A1A67A12@rv98U6@G8221PFU612X

 fU645647 }90@12@2S101FU64564@A8@S1A854)@82267415231R6V6R48P12@)@A171412F1V)01)902G4@14G848UG2122A)S15RU180UGSA6RA714162 

)0@A)U)7)92@)SA6RAX         

H13)0@647V)0231R6V6RP8@1068U2a2G2@1P2845P1@A)52  

p101v960164V)0P8@6)4V0)P89@A)028F)9@2)P1@G312)VP8@1068U2a1f3106P14@8U2G2@1P2845P1@A)52921564P84G2@95612X‚101a6456R8@1SA1@A1018RAP8@1068Ua

2G2@1P)0P1@A)5U62@156201U1T84@@)G)902@95GXtVG)98014)@29016V8U62@6@1P833U612@)G)90012180RAa0185@A18330)3068@121R@6)4F1V)0121U1R@64780123)421X    

g8@1068U2c1f3106P14@8U2G2@1P2 g1@A)52    

4h8 t4T)UT1564@A12@95G 4h8 t4T)UT1564@A12@95G   o b4@6F)5612 o (Ater21v  

o d9W80G)@6RR1UUU6412 o `U)SRG@)P1@0G 

o e8U81)4@)U)7G o gHtrF82154190)6P87647  o b46P8U2845)@A10)078462P2 

o ‚9P84012180RA380@6R6384@2

o (U646R8U58@8   

! " # $ % & '