<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Elsevier - Publisher Connector

Genomics 100 (2012) 203–211

Contents lists available at SciVerse ScienceDirect

Genomics

journal homepage: www.elsevier.com/locate/ygeno

Expansion of subunit families in early vertebrate tetraploidizations

David Lagman ⁎, Görel Sundström 1, Daniel Ocampo Daza, Xesús M. Abalo, Dan Larhammar

Department of Neuroscience, Science for Life Laboratory, Uppsala University, Box 593, SE-751 24 Uppsala, Sweden

article info abstract

Article history: Hundreds of gene families expanded in the early vertebrate tetraploidizations including many gene families Received 17 April 2012 in the phototransduction cascade. We have investigated the evolution of the heterotrimeric G- of Accepted 6 July 2012 photoreceptors, the , in relation to these events using both phylogenetic analyses and synteny Available online 17 July 2012 comparisons. Three alpha subunit were identified in amniotes and the coelacanth, GNAT1–3; two of these were identified in amphibians and teleost fish, GNAT1 and GNAT2. Most tetrapods have four beta Keywords: genes, GNB1–4, and teleosts have additional duplicates. Finally, three gamma genes were identified in mam- Tetraploidization Gene duplication mals, GNGT1, GNG11 and GNGT2. Of these, GNGT1 and GNGT2 were found in the other vertebrates. In frog and Phototransduction zebrafish additional duplicates of GNGT2 were identified. Our analyses show all three transducin families ex- G- panded during the early vertebrate tetraploidizations and the beta and gamma families gained additional Transducin copies in the teleost-specific genome duplication. This suggests that the tetraploidizations contributed to vi- Teleost fish sual specialisations. © 2012 Elsevier Inc. All rights reserved.

1. Introduction The contains sixteen GNA, five GNB and thirteen GNG genes. The superfamily encompassing the sixteen GNA genes can Many gene families have retained copies from the two rounds of be subdivided into four different classes based on sequence similarities tetraploidization (1R and 2R) that occurred in the early vertebrate ances- and functional specialisations: GNAI, GNAS, GNAQ and GNA12/13 [15,16]. tors and the third round (3R) that occurred in the lineage leading to tele- An additional class has been identified in teleost fish as well as in inver- ost fish. Not only studies of several gene families and their chromosomal tebrates, GNAV, giving a total of five GNA gene classes [15].TheGNAI regions give support to this conclusion, such as the endothelin system [1], class consists of the four families GNAI, GNAZ, GNAO and GNAT, where HOX gene clusters [2],neuropeptideY(NPY)system[3,4] and the opioid the latter family includes the two genes encoding the visual transducin system [5,6] but also whole genome sequence analyses [7,8].Each Gα subunits [16]: GNAT1 is specifically expressed in rods and GNAT2 group of homologous regions is called a paralogon [9]. specifically in cones [17]. In addition, there is a third member of this Preliminary analyses using sequence-based phylogenies combined family, GNAT3, also known as , expressed mainly in re- with chromosomal data of the transducin subunit gene families have ceptor cells [18]. proposed that they also expanded in these tetraploidizations [10]. The three genes GNAT1–3 are located on different and Transducins are the heterotrimeric G proteins (guanine nucleotide each gene is located adjacent to a GNAI gene [10].Specifically, GNAT1 is lo- binding proteins) in the phototransduction cascade of vertebrate visual cated next to GNAI2, GNAT2 is close to GNAI3 and GNAT3 is together with photoreceptor cells relaying the signal from the to phosphodies- GNAI1 [12]. Several studies suggest a tandem duplication of an ancestral terase 6 (PDE6) [11]. The heterotrimeric G proteins consist of three sub- GNA gene as the origin of the ancestral GNAT and GNAI pair followed by units named Gα,Gβ and Gγ, and the genes that encode these three chromosomal duplications leading to three such pairs [10,19,20]. subunits, named GNA, GNB and GNG respectively, form three unrelated The five genes encoding Gβ in mammalian genomes are named GNB1, gene families. Each family has multiple members in vertebrate ge- GNB2, GNB3, GNB4 and GNB5 and are located on five different chromo- nomes. The GNA and GNG gene families in particular consist of several somes in the human genome. The mammalian proteins encoded by the subfamilies out of which we have analysed those that include members GNB1–4 genes share a relatively high protein sequence identity, 80–88%, expressed in rods and cones. These two cell types express different but while the GNB5 protein only shares about 50% identity with the others related genes from all three transducin subunit families [12–14]. [17].TheGNB1–4 genes seem to belong to a paralogon that arose in 2R, based on their chromosomal locations [10,12]. However, their sequence-based phylogenies have not been totally congruent with ⁎ Corresponding author. Fax: +46 18511540. this conclusion [10,12]. Out of these four genes, GNB1 is expressed in E-mail address: [email protected] (D. Lagman). 1 Present address: Department of Medical Biochemistry and Microbiology, Uppsala rods and GNB3 in cones and their protein products are thus used in University, Box 582, SE-751 23 Uppsala, Sweden. the transducin heterotrimers [14].

0888-7543/$ – see front matter © 2012 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2012.07.005 204 D. Lagman et al. / Genomics 100 (2012) 203–211

There are thirteen GNG genes in the human genome and the pro- BLAST [26] searches to identify putative GNAT orthologs in invertebrate teins encoded by these genes, Gγ, are short, only about 70 amino genomes were unsuccessful. Two GNAT genes, named short- (GαtS) and acids and bind tightly to the Gβ subunits [21,22]. The GNG genes of long (GαtL) transducin alpha subunit, have previously particular interest with regard to photoreceptor cells are the closely been reported and cloned in sea lamprey [27]. These were referred to as related GNGT1, GNGT2 and GNG11 genes [22]. GNGT1 and GNGT2 are Pma.Short (GenBank accession no. ACB69761.1) and Pma.Long (GenBank expressed in rods and cones, respectively [21]. Furthermore a study accession no. ACB69760.1) respectively in the phylogenetic trees. The sea on the interactions between the five beta subunits and some of the lamprey GNAT amino acid sequences were included in the analysis to pro- gamma subunits has revealed that Gβ1 has high affinity towards vide relative dating for the transducin alpha subunit divergences. Gγt1 [23]. GNG11 is expressed in a variety of tissues but outside of The phylogenetic maximum likelihood (PhyML) and neighbour- the and is probably not involved in the vertebrate visual joining (NJ) trees of the GNAT family were rooted using the midpoint- phototransduction cascade [22]. The GNG11 gene has only been iden- root method and show that the sequences form three distinct clusters tified in mammals and is located in tandem with GNGT1, suggesting a with good statistical support (Figs. 1A and S1). The sea lamprey sequences local duplication from GNGT1 in the mammalian lineage [10]. are distantly related to all other GNAT sequences, with basal branching The GNGT1, GNG11 and GNGT2 genes are located in the same genomic points in both phylogenetic analyses. However their orthology relation- regions as the developmentally important HOX gene clusters [2,10].These ships with the gnathostome sequences are unclear. Both Pma.Long and gene clusters are well studied in relation to the early vertebrate genome Pma.Short branches basally to GNAT1 in the PhyML analysis (Fig. 1A), doublings and the subsequent duplication in the teleost lineage. Most while in the NJ analysis Pma.Long clusters basally to both GNAT1 and vertebrateshavefourclusters(post2R)andteleostfish at least seven GNAT3 while Pma.Short clusters with the GNAT1 sequences (Fig. S1). (post 3R) [24,25]. The GNAT gene amino acid sequences are highly conserved in size Here we investigate in detail the evolution of the three transducin and sequence. The GNAT1 sequences consist of 350 amino acid residues gene families to see if their evolution correlates with duplications in the in the investigated species; the tetrapod GNAT2 and GNAT3 sequences basal vertebrate tetraploidizations and with the origin of rod and cone each consists of 354 residues; and the teleost GNAT2 ortholog sequences photoreceptor cells. Since the transducin subunits are highly conserved consist of 350 residues. The latter sequences lack four amino acid resi- and particularly the gamma subunits are quite small, they give unreliable dues close to the amino terminus at the same positions as all GNAT1 se- sequence-based phylogenetic trees. Therefore we have also analysed sev- quences, positions 12–15 in the GNAT alignment. Human and zebrafish eral neighbouring gene families to get more accurate phylogenies for GNAT1 amino acid sequences share 93% identity to each other and these chromosomal regions. human and zebrafish GNAT2 share 81% identity. The specificamino acid residues forming contacts with the beta–gamma dimer [28] are 2. Results all conserved throughout the GNAT family in all vertebrate sequences included in this study. However we could observe individual GNAT1, 2.1. Transducin alpha subunit genes (GNAT) GNAT2 and GNAT3 subtype specific amino acid differences within the region required for beta–gamma dimer binding, positions 1–29, which Three GNAT genes, GNAT1–3,wereidentified in amniotes and the overlaps with the first -binding region [28], positions 7–27. In sarcopterygian fish included in this study, while only GNAT1 and GNAT2 this region the GNAT3 sequences have serine at position seven while could be identified in the teleost and amphibian genomes (Fig. 1A). all other sequences have alanine: GNAT1 has at position

Fig. 1. Phylogenetic maximum likelihood trees of the GNAT and GNAI gene families. A) Midpoint-rooted PhyML tree with vertebrate GNAT1, GNAT2 and GNAT3 amino acid se- quences. B) PhyML tree with the vertebrate GNAI1, GNAI2 and GNAI3 and invertebrate orthologous amino acid sequences. The tree is rooted with the identified fruit fly ortholog; the root is not displayed. The three letter abbreviations represent the species and the numbers or roman numerals represent the chromosome or genomic scaffold carrying the gene. Bootstrap values are shown at nodes. D. Lagman et al. / Genomics 100 (2012) 203–211 205 nine while GNAT2 and the sea lamprey sequences have aspartic acid and In the rooted PhyML analysis of the protein sequences encoded by the GNAT3 has serine except for the ceolacanth sequence which has aspar- GNB1–4 genes we can observe four well-supported clusters, one for each agine: GNAT1 has at position 24 while GNAT2 and GNAT3 have GNB gene, the tree was rooted with the identified fruit fly ortholog glutamine and the sea lamprey sequences have an alanine: GNAT3 (Fig. 2). We can also observe short branch lengths in the GNB1, GNB2 sequences have arginine at position 29 while all other sequences mostly and GNB4 clusters indicating strong conserving evolutionary pressure. have lysine. In the second opsin-binding region [28], positions 315–333, The GNB3 cluster, however, has relatively longer branch lengths indicat- the amniote GNAT3 sequences and the sea lamprey sequences have ing more rapid evolution of these genes. In the NJ analysis we see a similar glutamic acid at position 315 while the teleost GNAT2 has glycine and topology except for a different placement of the tunicate ortholog basally the rest of the sequences mostly have aspartic acid at this position. to the GNB3 cluster (Fig. S3). The amniote GNAT3 has aspartic acid at position 316 while the rest of The human GNB3 amino acid sequence shares 83% identity with the the sequences have a different amino acid at this position, mostly valine. human GNB1 sequence, 81% identity with GNB2 and 80% identity with The amino acid positions important in interactions between subunits GNB4.Thethreezebrafish amino acid sequences (two of which and receptors mentioned above have been highlighted with coloured are identical) share 98% and 99% identity with the human GNB1 se- boxes in the GNAT alignment (Fig. S2). quence; the zebrafish protein shares 95% identity to the human ortholog; zebrafish gnb3a shares 80% and gnb3b 67% to the human GNB3 and zebrafish proteins share 89% and 93% to human GNB4. 2.1.1. Analyses of the neighbouring GNAI gene family Thus, the GNB3 amino acid sequences are the most divergent in this The GNAT and GNAI ( alpha inhibitory subunit) genes share group. The 16 amino acid residues involved in Gα binding [29] are all thesameexon–intron organisation with eight exons [17].Inthemajority t conserved in GNB1–4 except in the gnb3b sequences of teleost fish. In of vertebrate species each GNAT gene is located in a pair together with a these sequences the amino acid isoleucine, residue 87 in the GNB align- GNAI gene, therefore it is of interest to study also the evolution of the ment, has been replaced by leucine, thus a highly conservative change. GNAI family in relation to the GNAT family. Three GNAI genes have been In the gnb3b sequences of medaka, spotted green puffer fish and identified in tetrapods; GNAI1, GNAI2 and GNAI3 [10,19,20].Weidenti- three-spined stickleback residue 105, serine, also involved in Gα bind- fied these three genes in all tetrapods investigated as well as in the t ing have been changed to alanine. The residues interacting with the sarcoperygian fish, the coelacanth. Orthologs of all three tetrapod gamma subunit [29] are mostly conserved or show conservative genes, with additional duplicates of GNAI1 and GNAI2,wereidentified in the teleost genomes. However, one of the GNAI2 duplicates could not be identified in the medaka genome database. Single putative GNAI orthologs were identified in fruit fly, purple sea urchin, Florida lancelet and sea squirt genome databases. The sequence-based phylo- genetic analyses of the GNAI genes show a topology consistent with an expansion during the vertebrate tetraploidizations. The vertebrate GNAI genes form three clusters with GNAI1, GNAI2 and GNAI3,respec- tively, that diverge from each other after the branching points of the in- vertebrate sequences (Fig. 1B). These analyses were rooted with the fruit fly ortholog. A more comprehensive analysis of the chromosomal region of the GNAT and GNAI genes is in progress, involving several ad- ditional gene families consistent with this scenario.

2.2. Transducin beta subunit genes (GNB)

The GNB5 gene was excluded from the analysis since it formed a separate clade rooted with invertebrate sequences in the preliminary analyses (data not shown). Four genes of the GNB1–4 group were identified in the GNB family in all but two of the tetrapod genomes in- cluded in the study: The GNB2 gene could not be identified in the ge- nome databases of chicken and western-clawed frog. Further BLAST searches were performed in zebrafinch (Taeniopygia guttata), mal- lard (Anas platyrhynchos) and turkey (Meleagris gallopavo). These searches were also unfruitful. However, all four genes could be iden- tified in the database of the green anole lizard. In both spotted green pufferfish and three-spined stickleback, six GNB genes of the GNB1–4 group were identified while five were found in the medaka genome database and eight in zebrafish. All fish species except medaka have at least two copies of GNB1.Zebrafish has three GNB1 homologs, two of which are placed adjacent to one another on chromosome 6. These two share 100% identity at the amino acid level and differ only with 1.5% at the nucleotide level. All the teleost fish genomes studied have two GNB3 homologs that have been named gnb3a and gnb3b.Thezebrafish, in contrast to all the other teleosts included in the study, possesses two GNB4 homologs as well. Single putative homologous genes of the GNB1–4 genes were identified in sea squirt, Fig. 2. Phylogenetic maximum likelihood tree of the GNB1–4 gene family. Rooted purple sea urchin and fruit fly while two were identified in the PhyML tree with vertebrate GNB1–4 and invertebrate orthologous amino acid se- quences. The tree is rooted with the identified fruit fly ortholog. Sequence names are Florida lancelet. The two Florida lancelet genes are located in tandem applied as in Fig. 1. Bootstrap values are shown at the nodes. Clusters are marked on the same genomic scaffold suggesting an independent duplica- with colours representing the different human chromosomal regions: Hsa1, Hsa3, tion in this lineage. Hsa7 and Hsa12. 206 D. Lagman et al. / Genomics 100 (2012) 203–211 differences between the different GNB subtypes, the most deviating are been subjected to strong evolutionary selection mechanisms, the phylog- the teleost GNB3 sequences. For the GNB amino acid alignment see Fig. enies of vision genes have been difficult to resolve. In this regard, the op- S4. sins are a particularly good example [30–33]. For the transducin subunit genes, the challenge arises because of different factors that limit the reso- 2.2.1. Analyses of chromosomal GNB neighbours lution of the sequence-based phylogenetic analyses, namely, the large de- Our selection criteria resulted in the identification of 11 neighbouring gree of conservation of both the alpha and beta subunit genes, and the gene families for the analysis of conserved synteny. Preliminary phyloge- short length of the gamma subunit genes. In this study, we have com- netic analyses of three of these gene families, namely the endothelin bined sequence-based analyses with chromosomal location data allowing converting (ECE), ephrin type precursor (EPHR)andper- us to deduce the evolution of the transducin subunit genes by analysing oxisomal biogenesis factor (PEX), show complex topologies that indicate also the neighbouring gene families in these chromosomal regions. duplications in an earlier time-frame than vertebrate evolution (data When combined, these two data sets give a more reliable picture of the not shown). These three families were therefore excluded from the subse- evolutionary history of the vertebrate transducins. quent analyses. The families included in the analysis of conserved synteny are; non-voltage gated 1 (SCNN1), chromodomain 3.1. Transducin alpha family DNA binding protein (CHD), dishevelled (DVL), intermediate fil- ament family orphan (IFFO), leucine proline-enriched, proteoglycan The ancestral transducin alpha subunit gene (GNAT)anda (leprecan) (LEPRE), mitofusin (MFN), solute carrier family 2 (SLC2A)and neighbouring ancestral GNAI gene probably arose by a local dupli- ubiquitin specificpeptidase(USP). For phylogenetic trees of the identified cation of an ancestral GNAI/GNAT gene as evidenced by their shared neighbouring gene families see Figs. S5–S12. chromosomal location and exon–intron structure [10,19,20]. This duplication probably took place before the divergence between 2.3. Transducin gamma subunit genes (GNG) protostomes and deuterostomes as evidenced by the presence of GNAI orthologs in several invertebrate genomes (Fig. 1B). These Three genes of the class encompassing the GNGT genes were identified genes cluster together with the GNAI genes and not the GNAT in all placental mammals: GNGT1, GNGT2 and GNG11. GNG11 is located genes in our preliminary analyses of the whole G protein alpha immediately adjacent to GNGT1, thus reflecting local gene duplication. subunit gene family (data not shown). This would imply that the GNGT1 and GNG11 could not be identified in the opossum database. GNAT gene was lost from the invertebrate deuterostomes investi- Two were identified in birds and reptiles, GNGT1 and GNGT2.Inthe gated in this study. The duplication could also have taken place western-clawed frog three GNGT genes, one GNGT1 and two GNGT2, from a GNAI gene later in the vertebrate lineage after the split were identified. In all teleost fish except medaka both GNGT1 and from the tunicate lineage as suggested by the lack of clear GNAT GNGT2 could be identified. The zebrafish has two GNGT2 genes annotated orthologs in invertebrate genomes. Since no clear GNAT ortholog as gngt2a and gngt2b in the present genome assembly (Zv9). could be identified in any of the invertebrate genomes investigat- The zebrafish GNGT1 amino acid sequence shares 66% sequence iden- ed, the GNAT family tree was left un-rooted. This pair of one GNAT tity with human GNGT1. The zebrafish GNGT2 amino acid sequences share and one GNAI gene was later duplicated to form the two families 70% (gngt2a) and 55% (gngt2b) identity with human GNGT2, respectively. we see today, indicated by the chromosomal location and species

Many of the amino acid residues of the Gγt subunits make contact with repertoire of the respective gene families (Figs. 1 and S1). the Gβ subunit [29]. The major difference between the GNGT1 and The unsuccessful searches for GNAT3 genes in the investigated am- GNGT2 amino acid sequences lies in the N-terminus end of the polypep- phibian and teleost fish genomes are consistent with results from earlier tide. The first four residues of the GNGT1 (and GNG11)sequencesarein- studies [10,19,20]. However, it is unlikely that GNAT3 arose as a duplicate volved in Gβ binding, however GNGT2 is four amino acid residues in the amniote lineage, as suggested by the species repertoire of the clus- shorter and lacks these four N-terminal residues (alignment in Fig. S13). ter, because the orthologs of the GNAI1 gene located next to GNAT3 in Preliminary phylogenetic analyses of the GNGT genes show that they amniote genomes are present in the genomes of all teleost fish and the give unreliable phylogenetic trees, probably due to their short length amphibian included in this study. In the genome of a sarcopterygian (about 70 amino acid residues long) in combination with a relatively fish, the coelacanth (Latimeria chalumnae), the genes annotated as high sequence identity. GNAT1 and GNAT2 are located adjacent to their respective GNAI2 and GNAI3 genes. The GNAT3 and GNAI1 genes were also identified, however 2.3.1. Analyses of chromosomal GNGT neighbours not on the same scaffold probably due to short scaffold lengths. This fur- The GNGT genes are located in the paralogon that houses the HOX ther strengthens the hypothesis that independent GNAT3 losses occurred gene clusters. This paralogon has previously been found to be the result in the amphibian, teleost fish (Fig. 4) and possibly sea lamprey lineages of 2R and 3R [2], suggesting duplication of the gamma transducin genes as previously proposed [10,19,20]. A more extensive analysis of the chro- as part of the same chromosomal blocks. The paralogous HOX regions in mosomal regions housing the GNAI and GNAT genesisinprogresstofur- the human genome are located on chromosomes 17, 7, 2 and 12. In the ther establish if these duplications correlate with the early vertebrate human genome, the GNGT1 gene is located approx. 66 Mb upstream of tetraploidizations. However, due to the extensive rearrangements of the HOXA cluster on , and the GNGT2 gene is located some of these chromosomes in the teleost lineage after 3R [34], this approx. 0.47 Mb upstream of the HOXB cluster on . In issue is challenging to resolve, as recently shown by an analysis of the ge- the zebrafish genome is located approx. 22.6 Mb downstream of nome of the spotted gar (Lepisosteus oculatus) [35] which belongs to a HOXAa on chromosome 19, gngt2a is located approx. 0.6 Mb upstream ray finned fish lineage that diverged from the teleost lineage before 3R. of the HOXBa on and gngt2b is located approx. 24.8 Mb We found no 3R or additional duplicates of the GNAT genes in the downstream of the HOXBb on chromosome 12. These chromosomal loca- analysed teleost genomes. It is possible that the functions of the Gαt pro- tions are consistent with duplication of an ancestral teleost GNGT2 gene in teins are so conserved and their expression so fine-tuned that any addi- the 3R event. For additional data on distances of the GNGT genes from the tional copies would have compromised cell functions and were therefore HOX gene clusters in all investigated species see Table S2. not tolerated and thereby lost due to negative selection post 3R. The differing branch-points of the sea lamprey GNAT sequence 3. Discussion called Pma.Long between the PhyML and NJ analyses (Figs. 1A and S1) make it difficult to determine the orthology relationships of the sea lam- Transducins are key proteins in of verte- prey genes. This species possesses two types of photoreceptor cells brates. Since the components of the phototransduction cascade have (long and short photoreceptors) which each have properties similar to D. Lagman et al. / Genomics 100 (2012) 203–211 207

Fig. 3. Conserved synteny of the GNB paralogon. Identified neighbouring protein families of GNB1–4 genes and their chromosomal locations in the human, opossum, chicken, zebrafish and three-spined stickleback genomes. Boxes are coloured after corresponding human chromosome and correspond to the colours of the clusters in the GNB PhyML tree (Fig. 2) and NJ tree (Supplementary Fig. S3).

rods and cones [27]. It is possible that the two sea lamprey GNAT genes shows how the transducin alpha subunit can be utilised by other G pro- represent the gnathostome GNAT1 and GNAT2 genes, but have under- tein coupled receptors (GPCRs) than the opsins. It is possible that the ob- gone divergent evolution. They were named after the lamprey photore- served differences in the receptor-binding regions of the alpha subunits ceptor cell type they were expressed in, GαtS (Pma.Short) in short may help to sort out opsin-transducin interaction. photoreceptor cells and GαtL (Pma.Long) in the long photoreceptor cells [27]. 3.2. Transducin beta family The differences we observe between the vertebrate GNAT1, GNAT2 and GNAT3 amino acid sequences within the region required for beta– The sequence identities between human and zebrafish GNB1–4 amino gamma dimer binding, possibly account for the heterotrimer composi- acid sequences, as detailed in the Results section, show that they are a fi tions between rods and cones. Furthermore the differences in the rst highly conserved family. Of the 16 amino acid residues involved in Gαt receptor-binding region could contribute to promote either binding only two differences are found, namely in teleost gnb3b. At posi- or opsin interaction. It has recently been shown in mice that GNAT2 tion 87 in the GNB amino acid sequence alignment, a change to leucine may interact with rhodopsin [36]. However the kinetics of the rod photo- from isoleucine can be observed in all teleost sequences. Position 105 receptor cells changed to more resemble the kinetics of cones [36]. This has a change to alanine from serine in gnb3b in the lineage leading to 208 D. Lagman et al. / Genomics 100 (2012) 203–211

of these neighbouring gene families supports the hypothesis of this re- gion, including the GNB1–4 genes, expanding in 1R, 2R and subsequently in 3R. Phylogenetic trees for the neighbouring gene families are shown in Supplementary Figs. S5–S12. The data on the chromosomal regions presented here are also con- sistent with comparative whole genome analyses: the GNB1–4 bearing regions on human chromosome 1, 7, 12 and 3 seem to originate from the same deduced ancestral linkage group or chromosome, both in a re- construction of linkage between the Florida lancelet and human ge- nomes, namely the ancestral chordate linkage group 10 [37], and in a bioinformatic reconstruction of the vertebrate ancestral karyotype, an- cestral vertebrate chromosome F [8].

3.3. Transducin gamma family

The short amino acid sequences of the gamma subunits preclude reli- able sequence-based phylogenetic analyses. The major difference in the GNGT amino acid sequences is that the four N-terminal amino acid resi- Fig. 4. Schematic tree showing GNAT gene losses in three gnathostome lineages. GNAT3 dues of GNGT1 are not present in GNGT2. This could be a contributing fac- has been lost by independent events at least two times in the gnathostome lineage. Ar- tor to the binding specificity of the GNB subunit towards the GNGT subunit rowheads and crossed over white boxes in the GNAT column represent GNAT3 losses. observed in previous studies [23]. See alignment in Supplemental Fig. S13. These limitations make the chromosomal locations of the GNGT genes medaka, green spotted puffer fishandthree-spinedstickleback.Itispos- even more important in the work to elucidate their evolution in the ver- sible that the differences in the GNAT- and the differences in the GNGT tebrate lineage. Fortunately, the positions of the GNGT1 and GNGT2 genes interacting amino acid residues contribute to subunit composition in the within the HOX gene regions, together with the species distribution of transducin heterotrimers (alignment in Fig. S4). these two genes throughout the vertebrate clade, give valuable informa- As shown by the long branch lengths of the GNB3 cluster in the PhyML tion. With the extended species repertoire we provide in this analysis, and NJ trees (Figs. 2 and S3) this is clearly the most rapidly evolving mem- we can further strengthen the conclusions made by Nordström et al. ber of the family. Nevertheless, the chromosomal data we present here that these genes probably share their chromosomal region with the are in line with what Nordström et al. suggested, the GNB3 genes and HOX genes and therefore expanded in a similar way [10]. Previous studies the other three, GNB1, GNB2 and GNB4, genes share a common ancestry of the paralogon containing the HOX gene clusters have supported the an- and arose in the early vertebrate tetraploidizations [10]. Analysis of the cestral quadruplication of these chromosome regions in 2R [2],thereby neighbouring gene families supports quadruplication of a single ancestral arguing strongly for the duplication of the GNGT genes through the region in 2R and additional duplication in 3R (Fig. 3). All four of the same mechanism. As noted before, a third gene, GNG11 has only been transducin beta subunit genes have been retained in the gnathostome ge- identified in the mammalian lineage, where it is adjacent to GNGT1. This nomes investigated, except those of chicken and western-clawed frog. gene probably arose through a local duplication of GNGT1 in an early Furthermore, 3R duplicates have been retained for the GNB1 and the mammalian ancestor [10].TheGNGT1 and GNG11 genes could not be GNB3 genes in the teleost fish genomes included in this study, except identified in the opossum. However, as both these genes are present in for the medaka genome, which is missing one of the GNB1 3R duplicates. the wallaby (Macropus eugenii) genome assembly, it cannot be excluded Zebrafish has retained a 3R duplicate also of GNB4.ThetwoadjacentGNB1 that they are missing from the opossum genome database due to se- genes on zebrafish chromosome 6 are probably the result of a very recent quencing or assembly errors. local duplication in the zebrafish lineage or possibly due to an error in the The two GNGT2 genes of the western-clawed frog are different from assembly of the genome sequence. each other in the amino acid sequence and probably are the result of a lin- Thus, we conclude that the GNB paralogon originated from a single eage specificduplication.GNGT2 could be identified in all teleost species chromosome block that was quadrupled in the two rounds of genome du- except medaka. Since the medaka retina has four kinds of cone photore- plication before the radiation of vertebrates. The resulting chromosomal ceptor cells [38] itwouldberemarkableifGNGT2 truly had been lost in quartet was later duplicated once more in the teleost ancestors, in 3R, giv- this species, therefore we performed BLAST searches in the NCBI ing teleost fish additional paralogous chromosomes (Fig. 3). In the human expressed sequence tag (EST) database of medaka. In these searches sev- lineage the comparative analysis of synteny shows that the GNB2 gene eral ESTs could be identified as possible GNGT2 mRNAs, so presumably has been translocated from chromosome 17 to chromosome 7. The opos- the gene is simply missing in the genome assembly. In the zebrafish ge- sum GNB2 gene is located in a region of chromosome 2 that is homolo- nome we could identify two GNGT2 gene predictions, annotated as gous to human chromosome 17. Rearrangements were also found in gngt2a and gngt2b, located on chromosomes 3 and 12 respectively. other lineages. Just as in the human genome, GNB2 seems to have been These homologous chromosome regions have previously been reported translocated in the zebrafish genome. However this is not the case in to be the result of 3R [2], which supports the duplication of GNGT2 in the three-spined stickleback genome, which suggests independent trans- the same event, even though one of the duplicates seems to have been location in the zebrafish lineage (Fig. 3). Individual genes of the lost in all teleost species studied except for zebrafish. As zebrafish belongs neighbouring gene families have also been translocated and one gene to a lineage that diverged relatively early in teleost evolution [39],one3R family has been completely lost in teleosts, namely the SCNN1 gene fam- duplicate could have been lost once in the ancestor of the other three spe- ily. We identified putative members of this family in the sea lamprey cies. It is not possible from sequence or synteny analyses to say whether it (trees in Fig. S5). The translocations and gene losses observed in the tele- is gngt2a or b that has been retained in the other teleosts. ost lineage are in part due to rearrangements that occurred after 3R [34]. In our synteny analyses, the chicken orthologs of the genes located on 4. Conclusions human chromosome 17 and the GNB2 region of opossum chromosome 2 could not be identified in the current genome assembly. This is possibly Our analyses show that all three gene families that form the due to sequencing or assembly errors. In spite of these rearrangements transducin heterotrimers expanded in the two basal vertebrate and losses in several lineages, the sequence-based phylogenetic analysis tetraploidizations. This is the most parsimonious interpretation of D. Lagman et al. / Genomics 100 (2012) 203–211 209 the combined datasets consisting of sequence-based phylogenetic genome database, release 60, (http://www.ensembl.org/index.html) analyses as well as chromosomal analyses of conserved synteny, in- from the following species: human (Homo sapiens, Hsa), mouse (Mus cluding phylogenetic analyses of neighbouring genes. Thus, the musculus, Mmu), grey short-tailed opossum (Monodelphis domestica, early vertebrate tetraploidizations provided the basis for the subse- Mdo), chicken (Gallus gallus, Gga), green anole lizard (Anolis carolinensis, quent specialisation of transducin subunits leading to differential ex- Aca), western clawed frog (Silurana (Xenopus) tropicalis, Str), zebrafish pression in either rods or cones. We also report that both the (Danio rerio, Dre), green spotted pufferfish (Tetraodon nigroviridis, Tni), transducin beta and gamma subunit gene families have retained 3R three-spined stickleback (Gasterosteus aculeatus, Gac), medaka (Oryzias duplicates, as seen in teleosts with duplicate GNB1, GNB3 and latipes, Ola), sea squirts (Ciona intestinalis, Cin, and Ciona savignyi, Csa) GNGT2 genes (Fig. 5). It is still unknown whether the transducin 3R and fruit fly(, Dme). In cases where several tran- copies may have been subfunctionalized by being expressed in dif- scripts were present in the database the longest transcript or the tran- ferent subpopulations of teleost photoreceptor cells. Due to the script that had a length corresponding to the majority of the other high sequence identity between 3R duplicates, it will be a challenge genes in the family was used. Basic Local Alignment Search Tool to develop specific reagents that can distinguish these duplicates at (BLAST) [26], using the tBLASTn method, was used for searches to identify the protein or mRNA level for analyses of differential expression. In- missing sequence stretches or non-annotated family members. Full geno- vestigations of the expression patterns are in progress in our labora- mic sequence of regions containing non-annotated members was collect- tory with in situ hybridization in the zebrafish retina. If differential ed and the amino acid sequence was annotated using the Genscan web expression of transducin subunit duplicates is observed, the retina server (http://genes.mit.edu/GENSCAN.html) [40] complemented with of some teleost fishes would be even more specialised than the retina manual annotations. of mammals with regard to , in addition to what is If any gene prediction was missing in one of the species included already known from the higher number of opsin types in teleost fish. they were searched for using BLAST in the genome of the species, if they still not could be identified, further searches were performed 5. Materials and methods in closely related species, BLAST searches for missing gene sequences were performed in the genome databases of zebrafinch (T. guttata, Amino acid sequence predictions of transducin GNAT, GNB and GNGT http://www.ensembl.org/Taeniopygia_guttata/Info/Index), mallard subunit gene families were identified and retrieved from the Ensembl (A. platyrhynchos, http://pre.ensembl.org/Anas_platyrhynchos/Info/ Index), turkey (M. gallopavo, http://www.ensembl.org/Meleagris_ gallopavo/Info/Index), wallaby (M. eugenii, http://www.ensembl.org/ Macropus_eugenii/Info/Index) and coelocanth (L. chalumnae, Lch, http://www.ensembl.org/Latimeria_chalumnae/Info/Index). The human protein sequences for all members in the three main pro- tein families were used to perform BLAST searches in the Florida lancelet (Branchiostoma floridae, Bfl)genome(http://genome.jgi-psf.org/Brafl1/ Brafl1.home.html) and the purple sea urchin (Strongylocentrotus purpuratus, Spu) genome (http://www.hgsc.bcm.tmc.edu/project- species-x-organisms.hgsc) to identify putative transducin subunit orthologs in these species. Two cloned GNAT sequences from sea lamprey (Petromyzon marinus, Pma) were also included in the analysis [27].

5.1. Phylogenetic analyses

Multiple sequence alignments of the protein families were created using the ClustalW web tool [41] included in Jalview 2.5 version 10.0. [42] The alignments were manually curated following sequence homol- ogy and consensus for splice donor- and acceptor-sites, as well as the identification of conserved domains. Phylogenetic Neighbour-Joining (NJ) trees were created from the alignments using ClustalX 2.0.12 [41] with 1000 bootstrap replicas and standard settings. Calculation of best amino acid substitution model for the alignments was done using ProtTest 2.4 [43] with the settings; BIONJ as guide tree, slow as optimiza- tion strategy, all substitution matrices without add-ons used and nCat set to 4. Phylogenetic maximum likelihood trees (PhyML) were constructed using the PhyML 3.0 web server [44] available at http://atgc.lirmm.fr/ phyml/. The substitution models calculated in ProtTest were used to- gether with the following settings. Empirical equilibrium frequencies, proportion of invariable sites and gamma shape parameter were esti- mated from the dataset, BIONJ was used as starting tree, eight substitu- tion rate categories were used, no aLRT was computed and SPR was used as tree improvement method. The trees were finally bootstrapped with 100 replicas. Trees were rooted primarily with the identified fruit fly orthologs for each family. If one could not be identified, trees were rooted with nem- atode (Caenorhabditis elegans, Cel) or sea squirt family members identi- fied in the Ensembl database. One tree is left unrooted (Fig. S9) since Fig. 5. Proposed evolutionary history of the GNAT, GNB and GNGT gene families by ge- fi nome duplication events. Arrows indicate the 2R and 3R tetraploidization events. no invertebrate ortholog could be identi ed. The GNAT phylogenetic Boxes represent genes and dashed boxes represent gene losses. analyses (Figs. 1A and S1) have been midpoint-rooted, assuming that 210 D. Lagman et al. / Genomics 100 (2012) 203–211 the evolutionary rate across the family has been relatively low and con- [5] G. Sundström, S. Dreborg, D. Larhammar, Concomitant duplications of opioid pep- tide and receptor genes before the origin of jawed vertebrates, PLoS One 5 (2010) stant across branches. This strategy was applied since no invertebrate e10512. orthologs of GNAT could be identified. [6] S. Dreborg, G. Sundström, T.A. Larsson, D. Larhammar, Evolution of vertebrate opioid receptors, Proc. Natl. Acad. Sci. U. S. A. 105 (2008) 15487–15492. [7] P. Dehal, J.L. Boore, Two rounds of whole genome duplication in the ancestral 5.2. Analyses of chromosomal neighbouring protein families vertebrate, PLoS Biol. 3 (2005) e314. [8] Y. Nakatani, H. Takeda, Y. Kohara, S. Morishita, Reconstruction of the vertebrate ancestral genome reveals dynamic genome reorganization in early vertebrates, Different strategies to identify and analyse neighbouring gene fam- Genome Res. 17 (2007) 1254–1265. ilies were employed for the three different main gene families. For the [9] F. Coulier, C. Popovici, R. Villet, D. Birnbaum, MetaHox gene clusters, J. Exp. Zool. – GNAT1–3 gene family the chromosomal region is highly rearranged so 288 (2000) 345 351. [10] K. Nordström, T.A. Larsson, D. Larhammar, Extensive duplications of phototransduction the closest neighbour both evolutionarily and location wise, GNAI, was genes in early vertebrate evolution correlate with block (chromosome) duplications, used and analysed. For the GNB1–4 gene family, neighbours were se- Genomics 83 (2004) 852–872. lected by downloading lists of protein families, as predicted in [11] L. Birnbaumer, Expansion of signal transduction by G proteins. The second 15 years or so: from 3 to 16 alpha subunits plus betagamma dimers, Biochim. Ensembl release 60, from a region spanning 5 Mb upstream and Biophys. Acta 1768 (2007) 772–793. downstream of each GNB1–4 gene. This was performed for the ge- [12] D. Larhammar, K. Nordström, T.A. Larsson, Evolution of vertebrate rod and cone nomes of human, chicken and zebrafish. Families with members pres- phototransduction genes, Philos. Trans. R. Soc. Lond. B Biol. Sci. 364 (2009) 2867–2880. ent on at least two GNB bearing chromosomes in at least two of the [13] H. Chen, T. Leung, K.E. Giger, A.M. Stauffer, J.E. Humbert, S. Sinha, et al., Expression of species were selected for further analyses. The GNGT family members the G protein gammaT1 subunit during zebrafish development, Gene Expr. Patterns are only present in two human chromosomes and therefore the : GEP 7 (2007) 574–583. sorting utilised for the GNB1–4 family was not possible. In this case [14] Y.W.Peng,J.D.Robishaw,M.A.Levine,K.W.Yau,RetinalrodsandconeshavedistinctG protein beta and gamma subunits, Proc. Natl. Acad. Sci. U. S. A. 89 (1992) the location within the HOX paralogon helped in deducing the evolu- 10882–10886. tionary history of this family. [15] Y. Oka, L.R. Saraiva, Y.Y. Kwan, S.I. Korsching, The fifth class of Galpha proteins, – Amino acid sequence predictions of the selected neighbouring Pnas 106 (2009) 1484 1489. [16] T.M. Wilkie, D.J. Gilbert, A.S. Olsen, X.N. Chen, T.T. Amatruda, J.R. Korenberg, et al., gene families were collected using the BioMart function in Ensembl. Evolution of the mammalian G protein alpha subunit multigene family, Nat. BLAST searches were performed in order to identify non-annotated Genet. 1 (1992) 85–91. sequences. The sequences for each neighbouring were [17] G. Downes, N. Gautam, The G protein subunit gene families, Genomics 63110 (1999) 544–552. aligned and used to construct NJ and PhyML trees, with the methods [18] W. He, K. Yasumatsu, V. Varadarajan, A. Yamada, J. Lem, Y. Ninomiya, et al., and settings described for the transducin GNAT, GNB and GNGT subunit Umami taste responses are mediated by alpha-transducin and alpha-gustducin, protein families. The species included in the alignments and trees of J. Neurosci. 24 (2004) 7674–7680. [19]M.Ohmoto,S.Okada,S.Nakamura,K.Abe,I.Matsumoto,Mutuallyexclusive the neighbouring protein families were: human, opossum, chicken, expression of Gαia and Gα14 reveals diversification of cells in zebrafish, spotted green pufferfish, medaka, three-spined stickleback, zebrafish, J. Comp. Neurol. 1629 (2010) 1616–1629. sea squirts and fruit fly. If sequences were missing in any species these [20] Y. Oka, S.I. Korsching, Shared and unique g alpha proteins in the zebrafish versus mammalian senses of taste and smell, Chem. Senses 36 (2011) 357–365. amino acid sequences were searched for and retrieved from a relative- [21] N. Wettschureck, S. Offermanns, Mammalian G proteins and their cell type specific ly closely related species. Therefore sequences from green anole lizard, functions, Physiol. Rev. 85 (2005) 1159–1204. sea lamprey (retrieved from the Pre! Ensembl release 60: http:// [22] E. Balcueva, Q. Wang, H. Hughes, C. Kunsch, Human G protein gamma 11 and gamma fi – www.pre.ensembl.org, assembly: WUGSC 3.0/petMar1, available on 14 subtypes de ne a new functional subclass, Exp. Cell 319 (2000) 310 319. [23] K. Yan, V. Kalyanaraman, N. Gautam, Differential ability to form the G protein the UCSC genome browser: http://genome.ucsc.edu), Florida lance- betagamma complex among members of the beta and gamma subunit families, let and the nematode were added to some neighbouring protein J. Biol. Chem. 271 (1996) 7141–7146. family alignments and trees. [24] S. Kuraku, A. Meyer, The evolution and maintenance of Hox gene clusters in vertebrates and the teleost-specificgenomeduplication,Int.J.Dev.Biol.53 (2009) 765–773. [25] C.V. Henkel, E. Burgerhout, D.L. de Wijze, R.P. Dirks, Y. Minegishi, H.J. Jansen, et al., 6 . Role of the funding sources Primitive duplicate Hox clusters in the European eel's genome, PLoS One 7 (2012) e32231. This work was supported by grants from the Swedish Research [26] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403–410. Council and Carl Trygger's Foundation. The funding sources had no in- [27] H. Muradov, V. Kerov, K.K. Boyd, N.O. Artemyev, Unique transducins expressed in volvement in study design, data collection, analysis and interpretation long and short photoreceptors of lamprey Petromyzon marinus, Vision Res. 48 of data, writing of the report, or in the decision to submit the article (2008) 2302–2308. [28] D.G. Lambright, J. Sondek, A. Bohm, N. Skiba, H. Hamm, P. Sigler, The 2.0 A crystal for publication. structure of a , Nature 379 (1996) 311–320. [29] J. Sondek, A. Bohm, D.G. Lambright, H.E. Hamm, P.B. Sigler, Crystal structure of a Ga protein beta gamma dimer at 2.1 A resolution, Nature 379 (1996) 369–374. 7 . Disclosure statement [30] C.R.J. Laver, J.S. Taylor, RT-qPCR reveals opsin gene upregulation associated with age and sex in guppies (Poecilia reticulata) — a species with color-based sexual All authors declare no conflict of interest. selection and 11 visual-opsin genes, BMC Evol. Biol. 11 (2011) 81. [31] T.C. Spady, O. Seehausen, E.R. Loew, R.C. Jordan, T.D. Kocher, K.L. Carleton, Adaptive Supplementary data related to this article can be found online at molecular evolution in the opsin genes of rapidly speciating cichlid species, Mol. http://dx.doi.org/10.1016/j.ygeno.2012.07.005. Biol. Evol. 22 (2005) 1412–1422. [32] M.H.D. Larmuseau, K. Vancampenhout, J.A.M. Raeymaekers, J.K.J. Van Houdt, F.A.M. Volckaert, Differential modes of selection on the rhodopsin gene in coastal References Baltic and North Sea populations of the sand goby, Pomatoschistus minutus, Mol. Ecol. 19 (2010) 2256–2268. [1] I. Braasch, J.-N. Volff, M. Schartl, The endothelin system: evolution of vertebrate- [33] P.K. Lehtonen, T. Laaksonen, A.V. Artemyev, E. Belskii, P.R. Berg, C. Both, et al., specific –receptor interactions by three rounds of genome duplication, Mol. Candidate genes for colour and vision exhibit signals of selection across the fl – Biol. Evol. 26 (2009) 783–799. pied ycatcher (Ficedula hypoleuca) breeding range, Heredity (2011) 1 10. [2] G. Sundström, T.A. Larsson, D. Larhammar, Phylogenetic and chromosomal analyses [34] M. Kasahara, K. Naruse, S. Sasaki, Y. Nakatani, W. Qu, B. Ahsan, et al., The medaka of multiple gene families syntenic with vertebrate Hox clusters, BMC Evol. Biol. draft genome and insights into vertebrate genome evolution, Nature 447 (2007) – 8 (2008) 254. 714 719. [3] G. Sundström, T.A. Larsson, S. Brenner, B. Venkatesh, D. Larhammar, Evolution of the [35] A. Amores, J. Catchen, A. Ferrara, Q. Fontenot, J.H. Postlethwait, Genome evolution Y family: new genes by chromosome duplications in early vertebrates and meiotic maps by massively parallel DNA sequencing: spotted gar, an – and in teleost fishes, Gen. Comp. Endocrinol. 155 (2008) 705–716. outgroup for the teleost genome duplication, Genetics 188 (2011) 799 808. [4] T.A. Larsson, F. Olsson, G. Sundstrom, L.-G. Lundin, S. Brenner, B. Venkatesh, et al., [36] C. Chen, M.L. Woodruff, F.S. Chen, H. Shim, M.C. Cilluffo, G.L. Fain, Replacing the rod α Early vertebrate chromosome duplications and the evolution of the neuropeptide with the cone transducin subunit decreases sensitivity and accelerates response – Y receptor gene regions, BMC Evol. Biol. 8 (2008) 184. decay, Society 17 (2011) 3231 3241. D. Lagman et al. / Genomics 100 (2012) 203–211 211

[37] N.H. Putnam, T. Butts, D.E.K. Ferrier, R.F. Furlong, U. Hellsten, T. Kawashima, et al., [41] M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, The amphioxus genome and the evolution of the chordate karyotype, Nature 453 et al., Clustal W and Clustal X version 2.0, Bioinformatics (Oxford, England) 23 (2008) 1064–1071. (2007) 2947–2948. [38] Y. Nishiwaki, T. Oishi, F. Tokunaga, T. Morita, Three-dimensional reconstitution of [42] A.M. Waterhouse, J.B. Procter, D.M.A. Martin, M. Clamp, G.J. Barton, Jalview cone arrangement on the spherical surface of the retina in the medaka eyes, version 2—a multiple sequence alignment editor and analysis workbench, Zoolog. Sci. 14 (1997) 795–801. Bioinformatics (Oxford, England) 25 (2009) 1189–1191. [39] F. Santini, L.J. Harmon, G. Carnevale, M.E. Alfaro, Did genome duplication drive the [43] F. Abascal, R. Zardoya, D. Posada, ProtTest: selection of best-fitmodelsof origin of teleosts? A comparative study of diversification in ray-finned fishes, protein evolution, Bioinformatics (Oxford, England) 21 (2005) 2104–2105. BMC Evol. Biol. 9 (2009) 194. [44] S. Guindon, J.F. Dufayard, V. Lefort, M. Anisimova, W. Hordijk, O. Gascuel, [40] C. Burge, S. Karlin, Prediction of complete gene structures in human genomic New algorithms and methods to estimate maximum-likelihood phylogenies: DNA, J. Mol. Biol. 268 (1997) 78–94. assessing the performance of PhyML 3.0, Syst. Biol. 59 (2010) 307–321.