<<

Lampreys, the jawless vertebrates, contain only two ParaHox gene clusters

Huixian Zhanga,b,1, Vydianathan Ravia,1, Boon-Hui Taya, Sumanty Toharia, Nisha E. Pillaia, Aravind Prasada, Qiang Linb, Sydney Brennera,2, and Byrappa Venkatesha,c,2

aInstitute of Molecular and Cell Biology, Agency for Science, Technology and Research, Biopolis, Singapore 138673, Singapore; bChinese Academy of Sciences (CAS) Key Laboratory of Tropical Marine Bioresources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou 510301, China; and cDepartment of Paediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore

Contributed by Sydney Brenner, July 6, 2017 (sent for review March 20, 2017; reviewed by José Luis Gómez-Skarmeta and Nipam H. Patel) ParaHox genes (, Pdx,andCdx) are an ancient family of develop- elephant shark, Callorhinchus milii; as well as the lobe-finned fish, mental genes closely related to the Hox genes. They play critical roles in (Latimeria chalumnae), and spotted gar (Lepisosteus the patterning of brain and gut. The basal chordate, amphioxus, con- oculatus), a basal ray-finned fish (Actinopterygian), possess an ad- tains a single ParaHox cluster comprising one member of each family, ditional Pdx gene (called Pdx2)linkedtoGsx2 (Fig. 1) (8–10). In- whereas nonteleost jawed vertebrates contain four ParaHox genomic terestingly, teleosts that have experienced an additional round of loci with six or seven ParaHox genes. Teleosts, which have experienced genome duplication (3R) possess only six ParaHox genes distrib- an additional whole-genome duplication, contain six ParaHox genomic uted in six genomic loci (11, 12) (Fig. 1). As a consequence of the loci with six ParaHox genes. Jawless vertebrates, represented by lam- secondary gene loss, teleosts do not contain an intact cluster bearing preys and hagfish, are the most ancient group of vertebrates and are the three ParaHox genes and lack the Pdx2 gene (Fig. 1). On the crucial for understanding the origin and evolution of vertebrate gene basis of the number and genomic organization of ParaHox genes in families. We have previously shown that lampreys contain six Hox jawed vertebrates (gnathostomes), it can be inferred that the com- gene loci. Here we report that lampreys contain only two ParaHox mon ancestor of gnathostomes contained at least seven ParaHox gene clusters (designated as α-andβ-clusters) bearing five ParaHox genes located on four chromosomes. genes (Gsxα, Pdxα, Cdxα, Gsxβ,andCdxβ). The order and orientation Jawless vertebrates are the most ancient lineage of verte- EVOLUTION of the three genes in the α-cluster are identical to that of the single brates. The extant jawless vertebrates (cyclostomes) are repre- cluster in amphioxus. However, the orientation of Gsxβ in the β-cluster sented by two lineages, the lampreys and hagfishes, which is inverted. Interestingly, Gsxβ is expressed in the eye, unlike its homo- together form a monophyletic group (13). They possess primitive logs in jawed vertebrates, which are expressed mainly in the brain. The morphological features such as a single median “nostril” located lamprey Pdxα is expressed in the pancreas similar to jawed vertebrate dorsally compared with a pair of nostrils located ventrally in Pdx genes, indicating that the pancreatic expression of Pdx was ac- gnathostomes. In addition, they lack paired appendages, miner- quired before the divergence of jawless and jawed vertebrate lineages. alized tissues, hinged jaws, and a spleen (14). Although cyclo- It is likely that the lamprey Pdxα plays a crucial role in pancreas spec- stomes possess complex brains divided into forebrain, midbrain, ification and insulin production similar to the Pdx of jawed vertebrates. and hindbrain, similar to in gnathostomes, some subregions of the cyclostome brains are not as elaborate as those seen in jawless vertebrate | jawed vertebrates | ParaHox gene cluster | pancreas gnathostomes (15, 16). The intermediate phylogenetic position of cyclostomes between invertebrates and gnathostomes, and he ParaHox gene cluster is an evolutionary sister of the Hox Tgene cluster and comprises three physically linked Significance gene families: Gsx (known as GSH in human), Pdx (aka Xlox or Ipf1), and . The clustered organization of these genes was first Lampreys and hagfishes are the only living members of jawless discovered in the cephalochordate amphioxus, although their origin vertebrates, the most ancient lineage of vertebrates, and are predates the origin of sponges (1). They encode homeodomain- therefore a crucial group for understanding the evolution of containing transcription factors that are involved in the patterning vertebrates. ParaHox genes (Gsx, Pdx, and Cdx) are an impor- of the nervous system and the gut. Gsx isknowntobeinvolvedin tant family of developmental genes that play critical roles in patterning of the brain (2), whereas Pdx is involved in the devel- the patterning of brain, pancreas, and posterior gut of jawed opment of the invertebrate gut and vertebrate pancreas (3, 4). Cdx vertebrates. Here we show that lampreys contain two ParaHox genes, in contrast, are involved in the development of the posterior gene clusters compared with four ParaHox loci in most jawed gut, anus, and posterior neural tube (5). vertebrates. The lamprey Gsxβ gene is expressed specifically in The cephalochordate amphioxus possesses a single ParaHox the eye, an unusual expression domain for Gsx genes. The cluster containing three genes, Gsx, Xlox,andCdx,arrangedinthat pancreatic expression of the lamprey Pdx gene suggests the order, with Cdx present in the opposite orientation to the other two crucial role of Pdx in pancreas specification and insulin pro- genes (6). In contrast to amphioxus, vertebrates possess additional duction evolved in the common ancestor of vertebrates. members of ParaHox genes distributed on four or more chromo- some locations. For example, the human genome contains six Author contributions: S.B. and B.V. designed research; H.Z., V.R., B.-H.T., S.T., N.E.P., and A.P. performed research; V.R., Q.L., S.B., and B.V. analyzed data; and B.V. wrote the paper. ParaHox genes comprising an intact cluster (GSX1, PDX1,and Reviewers: J.L.G.-S., Consejo Superior de Investigaciones Científicas and Universidad Pablo CDX2) located on chromosome 13, a GSX2 (aka GSH2) located on de Olavide; and N.H.P., University of California, Berkeley. chromosome 4, a CDX1 gene on chromosome 5, and a third CDX The authors declare no conflict of interest. gene, termed CDX4, on chromosome X (Fig. 1) (7). The duplicate Data deposition: The sequences reported in this paper have been deposited in the GenBank copies of genes flanking the paralogous ParaHox genes on the four database (accession nos. KY563327–KY563336). human chromosomes provide support to the hypothesis that the 1H.Z. and V.R. contributed equally to this work. four ParaHox gene loci are the result of two rounds of whole- 2To whom correspondence may be addressed. Email: [email protected] or genome duplication (WGD) at the base of the vertebrates (7). [email protected]. Cartilaginous fishes (Chondrichthyes) such as the lesser spotted This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. dogfish, Scyliorhinus canicula; little skate, Leucoraja erinacea;and 1073/pnas.1704457114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1704457114 PNAS Early Edition | 1of6 Downloaded by guest on September 24, 2021 and/or reverse-transcription PCR (RT-PCR; SI Appendix, Ma- terials and Methods), we identified five ParaHox genes orga- nized in two clusters in the Japanese lamprey genome. Because we could not determine the orthology of these genes to Para- Hox genes in gnathostomes (see following sections), we designated the Gsx, Pdx,andCdx genes on scaffold_1054+1722 as Gsxα, Pdxα, and Cdxα (α-cluster), and the Gsx and Cdx genes on scaffold_14 as Gsxβ and Cdxβ (β-cluster) (Fig. 2). In addition to genome assembly searches, we carried out degenerate PCR, using genomic DNA as a template to determine whether there were any additional ParaHox genes in Japanese lamprey (Materials and Methods). The de- generate PCRs were able to amplify fragments of Gsxα, Pdxα, Cdxα, and Cdxβ genes, but not Gsxβ, because of the divergence of its sequences corresponding to the forward primers and the presence of a lamprey-specific intron (2.9 kb) in the region targeted by PCR primers (Materials and Methods). Nevertheless, no new ParaHox genes were identified in the PCR products.

ParaHox Genes in the Sea Lamprey Genome. Searches for ParaHox genes in the genome assembly of the sea lamprey (www.ensembl. org/Petromyzon_marinus/Info/Index) identified Gsxβ and Cdxβ genes on scaffold_GL477479 (Fig. 2), and the first exon of Pdxα on scaffold_GL481998. The intergenic region of the sea lamprey Gsxβ and Cdxβ genes spanned ∼22 kb and contained three gaps (100 bp, 377 bp, and 100 bp). We filled these gaps by genomic PCR (GenBank accession numbers: KX400882–KX400884) and noted that the intergenic region does not contain a Pdx gene. It is therefore likely that the intergenic region of Japanese lamprey Gsxβ and Cdxβ also does not contain a Pdx gene.

ParaHox Genes in Lamprey RNA-seq Data. To determine whether there are any more ParaHox genes that are not represented in the genome assembly or in degenerate PCR products, we gen- erated RNA-seq data for four tissues (brain, eye, and intestine from Japanese lamprey pancreas from brook lamprey, Lampetra planeri) that were identified as the main tissues expressing ParaHox genes in lampreys by RT-PCR analysis (Expression Patterns Fig. 1. ParaHox gene loci in jawed vertebrates (gnathostomes). Organiza- of Lamprey ParaHox Genes). Searches of the assembled tran- tion of ParaHox gene loci in representative gnathostomes. Genes are scripts revealed transcripts for Gsxα in the brain, Gsxβ in the eye, depicted as block arrows, with the direction of arrows denoting the transcrip- Cdxα and Cdxβ in the intestine, and Pdxα in the pancreas. We tional orientation. The star represents the whole-genome duplication in the also searched RNA-seq transcripts of sea lamprey embryos from teleost lineage. GSX gene in human is known as GSH. Chromosomal/scaffold – – numbers of the ParaHox genes are shown at the right of each locus. stages 23 25 and 26 28 (18) and identified fragments of tran- scripts for Cdxβ, Gsxα, and Pdxα (SI Appendix, Sequence File). We generated full-length cDNA sequence for sea lamprey Gsxα, their distinctive morphological features, make them a crucial Pdxα, and Cdxβ by 5′/3′-RACE, using total RNA from embryo group for understanding the origin of additional copies of stages 26–28 as a template (Materials and Methods). Overall, our ParaHox genes and their functional diversification in verte- analysis of genome and transcriptome sequences indicate that α β α brates. A single ParaHox cluster has been identified in two lampreys contain only five ParaHox genes (Gsx , Gsx , Pdx , α β species of hagfish: the inshore hagfish (Eptatretus burgeri)and Cdx , and Cdx ) that are organized in two clusters (Fig. 2). the Atlantic hagfish (Myxine glutinosa) (17). Interestingly, this Phylogenetic Analysis and Synteny. To determine the orthology of cluster contains a Gsx and a Cdx gene with a pseudogenized Pdx lamprey and gnathostome ParaHox genes, we carried out phy- gene in the intergenic region. The organization and comple- logenetic analysis (Bayesian Inference) of ParaHox genes from ment of ParaHox genes in lampreys remains unknown. In this lamprey, hagfish, and representative gnathostomes, with am- study, we have carried out an exhaustive search for ParaHox phioxus as the outgroup. However, phylogenetic analysis was not genes in the genome assemblies of two species of lamprey (the informative, as the lamprey and hagfish (cyclostomes) genes Japanese lamprey, Lethenteron japonicum,andthesealamprey, generally formed an exclusive cluster outside the gnathostome Petromyzon marinus) and RNA-seq data from lamprey brain, clades (SI Appendix, Fig. S1). This pattern of independent clus- eye, intestine, and pancreas. Our analyses provide evidence for tering of cyclostome sequences outside gnathostomes sequences the presence of five ParaHox genes organized into two clusters suggests the duplication of cyclostome sequences occurred in- in lampreys. dependent from the duplication event in gnathostomes. How- ever, this clustering pattern is likely to be an artifact because of Results the high GC-content of the lamprey genome, which affects the ParaHox Genes in the Japanese Lamprey Genome. We searched the codon use pattern and amino acid composition of lamprey germline genome assembly of the Japanese lamprey (jlampreygenome. protein-coding sequences. As a result, sequences of lamprey imcb.a-star.edu.sg/) for ParaHox genes, using human, coelacanth, paralogues are more similar to each other than to their orthologs and elephant shark ParaHox proteins as TBLASTN queries. in gnathostomes. A similar pattern of exclusive clustering has On the basis of a combination of sequence analysis and RACE been previously observed for several lamprey genes such as Hox,

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1704457114 Zhang et al. Downloaded by guest on September 24, 2021 Fig. 2. ParaHox gene loci in the Japanese (A) and sea lamprey (B). The black rectangles denote the exons. The transcriptional orientation of the genes is indicated by the arrows. Gray circles represent the ends of scaffolds/contigs. The gap between Japanese lamprey scaffolds 1054 and 1722 was filled (dotted line) by genomic PCR.

KCNA, and the family (19–21). Thus, phylogenetic analysis the brain, eye, and intestine by RT-PCR. Both Cdxα and Cdxβ was not useful in assigning orthology between lamprey and gna- genes were found to express in the intestine (Fig. 4), similar to thostome ParaHox genes. Nevertheless, phylogenetic analysis was Cdx1, Cdx2, and Cdx4 genes of mammals (22) and Cdx1a and able to infer the orthology between Japanese lamprey and sea lam- Cdx1b ofzebrafish(23,24).TheJapaneselampreyGsxα showed prey sequences and between lamprey and hagfish sequences with expression in the brain (Fig. 4), similar to Gsx1 and Gsx2 genes of EVOLUTION high statistical support (posterior probability values, 0.96–1.0; SI mouse (25, 26) and zebrafish (27, 28). However, the Japanese Appendix,Fig.S1). Thus, we infer that the hagfish Gsx and Cdx genes lamprey Gsxβ did not show expression in the brain, but instead was (SI Appendix,Fig.S2) are orthologs of Japanese lamprey Gsxα and foundtoexpressintheeye,inadditiontoalowlevelofexpression Cdxα, respectively. in the intestine (Fig. 4). None of the vertebrate ParaHox genes Japanese lamprey Gsxβ and Cdxβ genesarelocatedonalarge characterized so far have been shown to express in the eye. Thus, scaffold (5.3 Mb) and are flanked by several genes. We compared the eye expression of lamprey Gsxβ seems to be an expression the synteny of the flanking genes with those flanking various gna- domain specific to the lamprey lineage. In gnathostomes, Pdx genes thostome ParaHox genes to see whether synteny provides any clue express specifically in the pancreatic tissue (4). Because we did not to the orthology of these lamprey ParaHox genes. However, the have access to the pancreatic tissue of the Japanese lamprey, we combination of paralogous genes flanking the lamprey Gsxβ and obtained cDNA from the pancreatic tissue of postmetamorphic Cdxβ genes is not found in any gnathostome ParaHox locus. For juvenile sea lamprey and checked for the expression of Pdxα.The example, homologs of lamprey Pan3, Mtif3,andGtf3a found in this sea lamprey Pdxα was indeed found to express in this tissue (Fig. 4). locus (scaffold_14) are present in the gnathostome Gsx1-Pdx1-Cdx2 locus (SI Appendix,Fig.S2), whereas homologs of lamprey Kit and Conserved Noncoding Elements. Noncoding elements conserved in Rasl11b, also found on this locus (scaffold_14), are present in the distant vertebrates have been shown to often function as en- gnathostome (e.g., elephant shark) Gsx2-Pdx2 locus (SI Appendix, hancers (20, 29, 30). To determine whether the cyclostome Fig. S2). This mixed pattern of syntenic genes in the lamprey sug- ParaHox gene loci contain any evolutionarily conserved regula- gests paralogous genes flanking the ParaHox genes were lost in- tory elements, we generated MLAGAN alignments of ParaHox dependently in the lamprey after it diverged from the gnathostome loci from lamprey, hagfish, and selected gnathostomes (elephant lineage. As such, synteny is also not useful in informing the shark, human, coelacanth, spotted gar, and zebrafish) and pre- orthology of lamprey and gnathostome ParaHox genes. dicted conserved noncoding elements (CNEs) that are at least 70% identical across >100-bp windows (Materials and Methods). Genomic Organization of Lamprey ParaHox Genes. The Japanese Because the orthologous relationship between the cyclostome lamprey Gsxα and Pdxα are on the same strand of DNA, whereas and gnathostome ParaHox genes was not clear, we aligned the Cdxα is on the opposite strand, similar to the organization of Gsx, Japanese lamprey and hagfish Gsxα locus with the gnathostome Pdx,andCdx genes in amphioxus and gnathostomes. However, the Gsx1, as well as Gsx2 locus and predicted CNEs. Both lamprey Japanese lamprey Gsxβ gene is on the same strand as Cdxβ,whichis and hagfish Gsxα loci were found to share a CNE with the an unusual arrangement. The inverted orientation of Gsxβ indicates gnathostome Gsx1 locus (Fig. 5A), but none with the gnathos- this gene underwent a local inversion in the lamprey lineage. The tome Gsx2 locus (SI Appendix, Fig. S4). The sharing of a CNE protein encoded by the lamprey Gsxβ contains several amino acid between the Japanese lamprey and hagfish Gsxα locus provides substitutions in the homeodomain (14 of 60) that is nearly totally further support to our inference based on phylogenetic analysis conserved in the Gsx proteins of all other chordates, including Gsxα that these loci are indeed orthologous. The CNE identified in the of lamprey (Fig. 3). These substitutions might have altered the lamprey and hagfish loci is actually part of a previously reported DNA binding specificity of the lamprey Gsxβ protein. The lamprey Gsx1 enhancer (#hs532) that is highly conserved in mouse and Gsxβ, Cdxβ,andPdxα genes are also unusual, in that they each human and has been shown to drive expression of a reporter gene contain an extra intron compared with their homologs in gnathos- to the forebrain, midbrain, and hindbrain in embryonic day (E)11.5 tomes (Fig. 3 and SI Appendix,Fig.S3). transgenic mouse embryos (VISTA Enhancer Browser: https:// enhancer.lbl.gov/). To check whether lamprey and hagfish share any Expression Patterns of Lamprey ParaHox Genes. We examined the unique CNEs, we aligned the Gsxα-Pdxα-Cdxα locus of Japanese expression pattern of the Japanese lamprey ParaHox genes in lamprey with the Gsxα-Cdxα locus of hagfish and predicted CNEs

Zhang et al. PNAS Early Edition | 3of6 Downloaded by guest on September 24, 2021 Fig. 3. Comparison of intron positions in the Gsx/Gsh genes of gnathostomes and jawless vertebrates. Multiple alignments of Gsx/Gsh proteins from representative gnathostomes (human, coelacanth, spotted gar, and elephant shark), jawless vertebrates (Japanese lamprey, sea lamprey, and inshore hagfish), and the cepha- lochordate amphioxus showing a selected region of the alignment for the Gsx/Gsh family. Intron positions are shown as arrowheads. The numbers next to the ar- rowheads indicate intron phases. Introns identified in the lampreys are shown as red arrows. The amino acid positions of the human sequences are shown. The boxed region denotes the homeodomain. The unique amino acid substitutions in the Hox domain of lamprey Gsxβ are shown in red font.

(SI Appendix,Fig.S5). Although we found an element unique to the hagfish ortholog share a CNE with the A-cluster of gnathostomes. two species (second CNE in SI Appendix,Fig.S5A), it turned out to Thus, the α-clusters of lamprey and hagfish are likely to be ortho- be a repetitive sequence (SI Appendix,Fig.S5B). logs of the gnathostome A-cluster. To examine whether the second ParaHox locus (Gsxβ-Cdxβ The β-cluster of lamprey contains a Gsx (Gsxβ) and a Cdx locus) in Japanese lamprey and sea lamprey share any CNEs with (Cdxβ) gene, a combination not found in any jawed vertebrate the gnathostome ParaHox gene loci, we aligned the lamprey loci ParaHox gene loci. Thus, lamprey seems to have retained a with each of the four ParaHox gene loci of the gnathostomes and different complement of ParaHox genes after the WGDs that predicted CNEs. However, no CNE was found to be shared gave rise to multiple ParaHox gene loci in vertebrates. In- between these lamprey loci and the gnathostome Gsx1 and Gsx2 terestingly, unlike Gsx genes of gnathostomes that express mainly loci (SI Appendix, Fig. S6 A and B). in the brain (2, 31), the lamprey Gsxβ gene is expressed mainly in the eye (Fig. 4). In mammals, Gsx genes are expressed in the Functional Assay of CNE. To investigate whether the single CNE progenitor cells of the embryonic brain and are involved in the conservedintheJapaneselampreyGsxα-Pdxα-Cdxα locusisan enhancer, we generated transgenic zebrafish bearing sequences specification of the lateral ganglionic eminence progenitors, encompassing the lamprey CNE (964 bp) linked to a GFP reporter. which in turn give rise to striatal projection neurons and olfac- α As a control, we assayed the homologous CNE region (670 bp) tory bulb interneurons (32). It is likely that the lamprey Gsx , from the zebrafish gsx1 locus. In situ hybridization studies have which expresses in the brain, performs a similar function. How- β shown that zebrafish Gsx1 is expressed in the diencephalon, mes- ever, the function of the lamprey Gsx which is expressed in the encephalon (midbrain), and rhombomeres (hindbrain) (27). The eye, is unclear and remains to be investigated. In addition to eye, zebrafish gsx1 CNE was able to drive reproducible expression in the Gsxβ is also expressed in the intestine at very low levels (Fig. 4), a hindbrain starting from the midbrain–hindbrain boundary and in parts of the midbrain of F1 transgenic zebrafish embryos (Fig. 5B), thereby recapitulating part of the expression pattern of the zebrafish gsx1 gene (27). In addition, high levels of GFP expression were also seen in branchial arches of transgenic zebrafish embryos (Fig. 5B). This is likely to be ectopic expression driven by the CNE construct. In contrast to the zebrafish CNE construct, the lamprey CNE failed to drive any recognizable GFP expression in F0 as well as F1 transgenic zebrafish embryos. Thus, it appears that the zebrafish transcription factors are unable to recognize the lamprey CNE, which is shorter and more divergent compared with its orthologous CNE from zebrafish and other gnathostomes. Testing the lamprey CNE in a transgenic lamprey system should clarify whether they indeed function as enhancers in lamprey. Discussion Our analysis of genome assemblies of two species of lamprey (Japanese lamprey and sea lamprey) and transcriptomes from lamprey brain, eye, intestine, and pancreas identified a total of five ParaHox genes (Gsxα, Pdxα, Cdxα, Gsxβ,andCdxβ). These genes were found to be organized in two clusters (α-andβ-clusters) in the Japanese lamprey genome. This is in contrast to the four ParaHox genomic loci containing six or seven genes in nonteleost gnathos- tomes and six ParaHox genomic loci, each with one ParaHox gene α Fig. 4. Expression pattern of lamprey ParaHox genes. RT-PCR analysis of lamprey in teleosts (Fig. 1). The -cluster of lamprey is a complete cluster ParaHox genes. Liver cDNA was used as a negative control. β-actin was amplified comprising a Gsx, a Pdx, and a Cdx gene similar to the A-cluster of as an internal control to assess the quality of the cDNA. Because we did not have nonteleost gnathostomes and the single ParaHox cluster in am- access to pancreatic tissue from Japanese lamprey, we used cDNA from the phioxus (Fig. 1). Furthermore, this lamprey ParaHox cluster and its pancreatic tissue of sea lamprey for RT-PCR of Pdxα (gel image on the Right).

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1704457114 Zhang et al. Downloaded by guest on September 24, 2021 EVOLUTION

Fig. 5. Potential cis-regulatory elements in the ParaHox gene loci. (A) VISTA plot of the MLAGAN alignment of the Gsx1-Pdx1-Cdx2 locus from representative gnathostomes (elephant shark, human, coelacanth, spotted gar, zebrafish) with the Gsxα-Pdxα-Cdxα locus from Japanese lamprey and Gsxα-Cdxα locus from the inshore hagfish. The elephant shark sequence was used as the base (x-axis). The y-axis represents percentage sequence identity. Blue peaks represent conserved exons, whereas pink peaks represent CNEs. The Japanese lamprey CNE is 99 bp long and 75.8% identical, whereas the hagfish CNE is 123 bp long and 76.4% identical. (B) 72-hpf F1 transgenic zebrafish embryos showing GFP expression driven by a CNE identified in the zebrafish gsx1 locus. hb, hindbrain; mb, midbrain. This expression was observed in embryos derived from four independent founders.

domain in which Cdx genes are known to express at high levels. lineages (34). However, an alternative possibility is that the In the ParaHox clusters of gnathostomes and amphioxus, Gsx lamprey ParaHox clusters duplicated along with the Hox clusters genes are always located at the 5′ end of the cluster and are as part of the WGD events, but subsequently experienced sec- expressed in the brain, whereas Cdx genes are found at the 3′ end ondary losses, resulting in the retention of only five ParaHox of the cluster and are typically expressed in the hindgut. The genes in two loci. This possibility is supported by the presence of unusual expression of the lamprey Gsxβ in the intestine could be some Japanese lamprey paralogues whose gnathostome homo- a result of the inversion of this gene in the lamprey lineage, logs are linked to the additional ParaHox gene loci. For example, which might have brought its promoter in close proximity to the gnathostome Pdgfra and Kit genes are linked to the ParaHox intestine enhancer of Cdxβ, resulting in the somewhat leaky ex- “B-cluster,” whereas their paralogues, Pdgfrb and Csf1r,arelinked pression of Gsxβ in the intestine. to the ParaHox “C-cluster” (SI Appendix, Fig. S2). The Japanese Most gnathostomes contain four paralogous loci for several lamprey genome contains homologs of both Kit and Csf1r genes, including the Hox genes and ParaHox genes, whereas (SI Appendix,Fig.S7A). Although the lamprey Kit is linked amphioxus, a basal chordate, contains a single locus for these to the β-cluster (SI Appendix,Fig.S2), Csf1r is located on genes. The quadruple loci in gnathostomes have been attributed scaffold_154 that does not contain any ParaHox gene. The Japa- to the two rounds of WGD during the early evolution of verte- nese lamprey genome also contains two Pdgfr genes (SI Appendix, brates (33). We have previously shown that the Japanese lam- Fig. S7B), located on scaffold_1691 and scaffold_22, that do not prey contains at least six loci, in contrast to four Hox harbor any ParaHox genes. The presence of Kit and Csf1r, and two clusters in most gnathostomes (20), suggesting the Hox clusters Pdgfr paralogues in the lamprey genome, suggests the ancestor of may have experienced an additional round of duplication in the Japanese lamprey contained more than two ParaHox clusters, and lamprey lineage (20). However, the Japanese lamprey contains the ParaHox genes in the additional clusters were subsequently only two ParaHox gene loci. This suggests the duplication event lost in the lamprey lineage. Thus, it seems likely that the ParaHox that gave rise to the six Hox loci did not include the ParaHox loci in the lamprey lineage underwent at least two rounds of du- cluster, implying that the additional Hox loci are not the result of plication, along with the Hox clusters, as a result of WGD events, WGD, but are instead a result of segmental duplication or du- and subsequently only two ParaHox loci were retained in the lam- plications, a scenario recently proposed based on the meiotic prey lineage because of extensive secondary loss of ParaHox genes. map of the sea lamprey genome (34). According to this study, the The homolog of the Pdx gene in invertebrates, known as Xlox, jawless vertebrate and gnathostome lineages experienced only is expressed in the gut, whereas Pdx in gnathostomes is expressed one round of WGD before their separation, which was followed in the pancreas, an organ specific to vertebrates. In mammals, by several independent segmental duplications in the two Pdx1 plays a crucial role in the development of the pancreas,

Zhang et al. PNAS Early Edition | 5of6 Downloaded by guest on September 24, 2021 differentiation of beta cells, and maintenance of their function data includes transcripts (SI Appendix) for the three hormones (35), and is therefore vital for insulin production. Inactivation of present in the pancreas of cartilaginous fishes, indicating that the Pdx1 in beta cells of mouse is known to result in maturity-onset three hormone pancreas had evolved in the common ancestor of diabetes (36) whereas mutations in human PDX1 (also known as all vertebrates. IPF1) result in pancreatic agenesis (37). Characterization of a ParaHox cluster in hagfish had shown that the Pdx gene in this Materials and Methods locus has become a pseudogene (17). Our study shows that the The Japanese lamprey and sea lamprey genome assemblies were searched for ortholog of this Pdx gene in lampreys is intact and expresses in ParaHox genes by TBLASTN, using selected vertebrate ParaHox proteins as the pancreas. Thus, the Pdx gene in lamprey is likely to be queries. RNA-seq reads for the Japanese lamprey brain, eye, and intestine and functional. In contrast to hagfish, whose islet organ (endocrine brook lamprey pancreas were generated on the Illumina platform and as- pancreas) comprises scattered follicles, lampreys contain discrete sembled using Trinity (40). To look for additional ParaHox genes in the ge- islet organs (38). The presence of a functional Pdx gene in nome of the lamprey, degenerate PCR was carried out using genomic DNA lampreys supports the notion that the development of its discrete as a template. Details of the primers used and PCR conditions are given in islet organ, as well as insulin production, is mediated by Pdx the SI Appendix. Phylogenetic analysis of ParaHox protein sequences were performed using Bayesian Inference as implemented in MrBayes (version similar to that in gnathostomes. In humans, PDX1 functions 3.2.6; mrbayes.sourceforge.net/). CNEs in the ParaHox loci were predicted by in pancreas as part of a network comprising insulin, glucagon, aligning the sequences with MLAGAN and visualized using VISTA (genome. SLC2A2, FOXA2, HNF1A, NEUROD1, PRKACB, and PRKACG lbl.gov/vista/index.shtml). The Japanese lamprey and zebrafish CNEs were (STRING database, https://string-db.org/). The brook lamprey pan- assayed for enhancer function in transgenic zebrafish using GFP as a re- creas RNA-seq data contains transcripts (SI Appendix) for genes porter. Further details of the materials and methods are given in the encoding homologs of all these proteins except PRKACG. Thus, SI Appendix. although lampreys do not possess a distinct pancreatic gland similar Detailed SI Appendix, Materials and Methods, SI Appendix, Sequence File, to jawed vertebrates, most of the genes associated with the Pdx- SI Appendix, Tables S1 and S2, and SI Appendix, Figs. S1–S10 are provided in network of jawed vertebrate pancreas are present and expressed in the SI Appendix. the islet organ of lampreys. A previous study had shown that carti- laginous fishes possess a three-hormone (insulin, glucagon, and so- ACKNOWLEDGMENTS. We thank Michael Schorpp and Thomas Boehm for providing pancreatic tissue of brook lamprey, and Jonathan M. Wilson for matostatin) pancreas and that the four-hormone (the three plus sea lamprey pancreas cDNA. This work was supported by the Biomedical pancreatic polypeptide) pancreas found in coelacanth and tetrapods Research Council of the Agency for Science, Technology and Research of was a later innovation (39). The brook lamprey pancreas RNA-seq Singapore.

1. Fortunato SAV, et al. (2014) Calcisponges have a ParaHox gene and dynamic ex- 22. Beck F, Stringer EJ (2010) The role of Cdx genes in the gut and in axial development. pression of dispersed NK homeobox genes. Nature 514:620–623. Biochem Soc Trans 38:353–357. 2. Valerius MT, et al. (1995) Gsh-1: A novel murine homeobox gene expressed in the 23. Davidson AJ, Zon LI (2006) The caudal-related homeobox genes cdx1a and act central nervous system. Dev Dyn 203:337–351. redundantly to regulate hox gene expression and the formation of putative hema- 3. Johnson JD, et al. (2003) Increased islet apoptosis in Pdx1+/- mice. J Clin Invest 111: topoietic stem cells during zebrafish embryogenesis. Dev Biol 292:506–518. 1147–1160. 24. Flores MV, et al. (2008) Intestinal differentiation in zebrafish requires Cdx1b, a 4. Jonsson J, Carlsson L, Edlund T, Edlund H (1994) Insulin-promoter-factor 1 is required functional equivalent of mammalian Cdx2. Gastroenterology 135:1665–1675. for pancreas development in mice. Nature 371:606–609. 25. Diez-Roux G, et al. (2011) A high-resolution anatomical atlas of the transcriptome in 5. Gamer LW, Wright CVE (1993) Murine Cdx-4 bears striking similarities to the Dro- the mouse embryo. PLoS Biol 9:e1000582. sophila caudal gene in its homeodomain sequence and early expression pattern. 26. Gray PA, et al. (2004) Mouse brain organization revealed through direct genome- – Mech Dev 43:71–81. scale TF expression analysis. Science 306:2255 2257. 6. Osborne PW, Ferrier DE (2010) Chordate Hox and ParaHox gene clusters differ dra- 27. Cheesman SE, Eisen JS (2004) gsh1 demarcates hypothalamus and intermediate spinal – matically in their repetitive element content. Mol Biol Evol 27:217–220. cord in zebrafish. Gene Expr Patterns 5:107 112. 7. Pollard SL, Holland PWH (2000) Evidence for 14 homeobox gene clusters in human 28. Satou C, et al. (2013) Transgenic tools to characterize neuronal properties of discrete – genome ancestry. Curr Biol 10:1059–1062. populations of zebrafish neurons. Development 140:3927 3931. 8. Mulley JF, Holland PW (2010) Parallel retention of Pdx2 genes in cartilaginous fish and 29. Bhatia S, et al. (2014) A survey of ancient conserved non-coding elements in the PAX6 locus reveals a landscape of interdigitated cis-regulatory archipelagos. Dev Biol . Mol Biol Evol 27:2386–2391. 387:214–228. 9. Mulley JF, Holland PW (2014) Genomic organisation of the seven ParaHox genes of 30. Parker HJ, Piccinelli P, Sauka-Spengler T, Bronner M, Elgar G (2011) Ancient Pbx-Hox coelacanths. J Exp Zoolog B Mol Dev Evol 322:352–358. signatures define hundreds of vertebrate developmental enhancers. BMC Genomics 10. Braasch I, et al. (2016) The spotted gar genome illuminates vertebrate evolution and 12:637. facilitates human-teleost comparisons. Nat Genet 48:427–437. 31. Hsieh-Li HM, et al. (1995) Gsh-2, a murine homeobox gene expressed in the de- 11. Mulley JF, Chiu CH, Holland PWH (2006) Breakup of a homeobox cluster after genome veloping brain. Mech Dev 50:177–186. duplication in teleosts. Proc Natl Acad Sci USA 103:10369–10372. 32. Pei Z, et al. (2011) Homeobox genes Gsx1 and Gsx2 differentially regulate telence- 12. Siegel N, Hoegg S, Salzburger W, Braasch I, Meyer A (2007) Comparative genomics of phalic progenitor maturation. Proc Natl Acad Sci USA 108:1675–1680. ParaHox clusters of teleost fishes: Gene cluster breakup and the retention of gene 33. Putnam NH, et al. (2008) The amphioxus genome and the evolution of the chordate sets following whole genome duplications. BMC Genomics 8:312. karyotype. Nature 453:1064–1071. 13. Kuraku S, Hoshiyama D, Katoh K, Suga H, Miyata T (1999) Monophyly of lampreys and 34. Smith JJ, Keinath MC (2015) The sea lamprey meiotic map improves resolution of – hagfishes supported by nuclear DNA-coded genes. J Mol Evol 49:729 735. ancient vertebrate genome duplications. Genome Res 25:1081–1090. 14. Hardisty MW (1979) Biology of the Cyclostomes (Springer, New York), pp 428. 35. Kaneto H, et al. (2008) PDX-1 and MafA play a crucial role in pancreatic beta-cell 15. Murakami Y, Uchida K, Rijli FM, Kuratani S (2005) Evolution of the brain de- differentiation and maintenance of mature beta-cell function. Endocr J 55:235–252. – velopmental plan: Insights from agnathans. Dev Biol 280:249 259. 36. Ahlgren U, Jonsson J, Jonsson L, Simu K, Edlund H (1998) beta-cell-specific inactivation 16. Wicht H (1996) The brains of lampreys and hagfishes: Characteristics, characters, and of the mouse Ipf1/Pdx1 gene results in loss of the beta-cell phenotype and maturity – comparisons. Brain Behav Evol 48:248 261. onset diabetes. Genes Dev 12:1763–1768. 17. Furlong RF, et al. (2007) A degenerate ParaHox gene cluster in a degenerate verte- 37. Stoffers DA, Zinkin NT, Stanojevic V, Clarke WL, Habener JF (1997) Pancreatic agenesis – brate. Mol Biol Evol 24:2681 2686. attributable to a single nucleotide deletion in the human IPF1 gene coding sequence. 18. Ravi V, et al. (2016) Cyclostomes lack clustered protocadherins. Mol Biol Evol 33: Nat Genet 15:106–110. 311–315. 38. Youson JH, Al-Mahrouki AA (1999) Ontogenetic and phylogenetic development of 19. Coffill CR, et al. (2016) The p53-Mdm2 interaction and the E3 ligase activity of Mdm2/ the endocrine pancreas (islet organ) in fish. Gen Comp Endocrinol 116:303–335. Mdm4 are conserved from lampreys to humans. Genes Dev 30:281–292. 39. Mulley JF, Hargreaves AD, Hegarty MJ, Heller RS, Swain MT (2014) Transcriptomic 20. Mehta TK, et al. (2013) Evidence for at least six Hox clusters in the Japanese lamprey analysis of the lesser spotted catshark (Scyliorhinus canicula) pancreas, liver and brain (Lethenteron japonicum). Proc Natl Acad Sci USA 110:16044–16049. reveals molecular level conservation of vertebrate pancreas function. BMC Genomics 21. Qiu H, Hildebrand F, Kuraku S, Meyer A (2011) Unresolved orthology and peculiar 15:1074. coding sequence properties of lamprey genes: The KCNA gene family as test case. 40. Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq data BMC Genomics 12:325. without a reference genome. Nat Biotechnol 29:644–652.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1704457114 Zhang et al. Downloaded by guest on September 24, 2021